 Hello, hi everyone and welcome to my talk about extracting synonyms from bilingual dictionaries. I am Mustafa Jarrar, I'm from Beir Zed University. I did this work with other colleagues. Before presenting the paper, please give me two minutes first to show you my university and a bit of my research. This is our campus, we have beautiful campus on top of a hill and this is how the life in the campus used to be but not anymore. It's closed now and no students and we are teaching over Zoom. This is me in the library with some heavy lexicons. Most of our research is focused on building linguistic resources. These are the main linguistic resources we have built so far. We have a lexicographic database which includes about 150 lexicons that we have been digitizing, cleaning, integrating over 10 years. We have actually the largest Arabic multilingual database now. We also have the Arabic ontology which is a formal Arabic word so far we have about between 15 to 20,000 concepts. Some of them are totally done and some of them are still in the pipeline to be added. All concepts are in the ontology are fully mapped to Princeton word and to Dochi, to BFO and also to wiki data. We also have a dynamic corpus annotated with many features. The three parts together are forming a big linguistic data graph. I will show you quickly now. This is the portal where we have all resources. If you search for a word, for example, time, you get results from 150 lexicons, synonyms, definitions, translations, etc. Also, this is the tab of the dictionary but also you can reach the ontology in this tab and you search and you get the notion of time and you start expanding before you go up. Time is an occurrence and occurrence is an entity. Entity is the main root which is divided into five categories, object, occurrence, dependent entity, abstract information, etc. You can keep expanding the tree. We have also morphology tab here. Okay, so back to our paper which is to extract synonyms. As you know, importance of synonyms is growing in many types of application areas in computer science and artificial intelligence. Synonyms are also essential parts in the dictionary and word needs and linguistic ontologies, dictionaries and others. There are different notions of synonymy in word embedding. So words are appearing in similar contexts are considered maybe synonyms and in the dictionary, they are closed related words in word needs. They are based on substitution, affinity of words in a context. In linguistic ontologies, they are equivalency classes. So these are different from loosey notions to strict notions of synonyms. Before presenting the algorithm, we developed the extract synonyms. I would like to mention that there are many areas doing tasks related to synonymy extraction. So in word need construction, people extract synonyms using other word needs or using machine translation or corbora, word embeddings, other dictionaries, etc. in order to build word needs. But there is also work or another area in discovering new translations from multilingual translation graphs, especially RDF. Other people are trying to extract synonyms to enrich or to validate existing dictionaries. So now our algorithm to extract synonyms. So we have the input of the algorithm is bilingual dictionary, which means a set of bilingual translation bears, just bears, and the output is bilingual sentences. So there are two steps in the algorithm. The first one we extract, we build the translation graph from the dictionary, and then we extract cycles. So notes participating in cycles are candidate sentences, bilingual sentences of this form. And then we do some consolidation to improve these candidates. I will show you this in an example. In this example, we have a bilingual dictionary. So this is just a table of two corns, nothing more. Word and another word. Word and another word. So one term and another term, I would say. This example, this example, by the way, I extracted it from the Arabic word need. So I get these senses. I converted them into bilingual dictionary, just flat bilingual dictionary. Now, we will use this to extract synonyms. So don't tell me we have the synonyms here. This is just an example where I have only these, only these. I assume I have only these, and then I build a graph from this bilingual dictionary. So we take a word for example, gaba. It translated in the dictionary here. As you see here, gaba, forest, wood, woods. So forest, wood, woods, and then from woods. So we take the English, every English word, and we find it's Arabic translation like ghab and adgal for this word. Wood means ghab, adgal, et cetera. So we take every word in English, then we translate it into Arabic, and every Arabic translated into English and so on. So we stop if we find cycle, means we go to the same word here. Or we stop if there's no more translation like this here. There are no more translations we stop. Or we stop until level k. Now, we are only interested in cycles. So we are not interested in other information. We are trusted only in cycles. We just keep the cycles here. And these cycles, we will convert them into sentences, bilingual sentences of this one. So now it's participating in cyclic paths are considered candidate sentences. So for example, gaba, which means woods, and then woods means ghab, and ghab means forest, and then we move back to this. So this is a cycle. So in this case, we have two English words, woods and forest like this. And we have ghab and ghab here. So and so on. So if you take a long one, ghab, woods, adgal, forest, wood, and you go back to ghab, and then you have this, this sentence. So we remove duplicates because we generate many duplicates here. Now, we take these generated candidate sentences. And now we start consolidation. So in this phase, we consolidate Arabic using English. So we want to consolidate these using these. So if we say, we say, okay, these for if two, if same sentence in English that have different Arabic sense, it's our union. So we are forest and woods, woods and forest. This is actually the sense it. So within we combine this, the words here, we end up with this. And we do this process the same like it is. So instead of nine, we have now four. We take this four, and we consolidate the English. So we go the other side. We consolidate the English since it's here. We have it's actually all the same. It's the same set in Arabic. So it's all the same. So it means we have to union these, all of these English words into one sense. Okay. So there is no more consolidation, but in case there's need for more consolidation, we keep doing this until no consolidation is found. Good. Now, this is the final output. So from the table, we end up with this sense. This is how we do. Maybe you say why we did the consolidation like this. The consolidation is actually based on some heuristics. So we say these heuristics, three main heuristics, we say it's less likely for a bilingual sense it, since it's to refer to multiple concepts. Means if we have bilingual sense it, so Arabic and English, for example, together. So typically it refers to one concept. But it's possible to refer to more, which means more policy. So if more policy or if these cases exist, then the accuracy of the algorithm will be, of course, affected. The second heuristic is it's less likely that a sense it is a subset of another sense. So if we have cases like A, B, C, D, and then we have A, B, C, another sense called A, B, C. Well, the more of these cases is actually the less the accuracy. The third one is it's less likely for the same English sense it to be translated into multiple Arabic senses. So these are the main three heuristics. Okay, how to evaluate the algorithm as you know, evaluation is not easy for ceremony. It's not to be difficult. Our proposed methodology is the following. So we used Arabic word knit, which, which is by the way, very challenging because it contains lots of polysems since it says the Arabic word is 10,000, about 10,000 concepts. So we extracted in order to evaluate the algorithm, we converted this into a bilingual table. And from this bilingual table, we extracted bilingual sense it's using our algorithm. And then we measure the difference between the extracted with the original Arabic word. The evaluation using precision recall and if major shows that we have about 82% accuracy, 80 precision and 84 recall. This is with K equals six and with consolidation. If no consolidation is done, it's 47. If we go even with consolidation to an amoral levels, even we have less accuracy. And also remember that we didn't do any tuning any specific language specific treatment or anything. And we did this evaluation on Arabic word knit, which is policy. That's it. So the conclusion, we have presented an algorithm to extract synonyms from flat bilingual dictionaries, which is actually can be applied in other languages because we didn't do any tuning or any language treatments when we reached good accuracy, even with a challenging resource. In the future, we plan to find a tune. We have lots of ideas how to find a tune it. We will use part of the speech, which we didn't use actually. And we will use other morphological features. There are specific things in Arabic we have to do like the air critics and inferences, which we didn't consider. And then we want to apply the algorithm to enrich where the Arabic ontology, especially that we have many bilingual dictionaries. Thank you very much for listening. And this is my email. Please email me if you have any question. Or if you have some dictionaries you want to extract, we will do it for you. Thank you very much.