 Hello, good morning, good evening, wherever you are. I'm sure we all wish to be together in South Africa right now. And thank you for giving me this opportunity to give this speech. I'm so excited to talk about where needs and linguistic ontologies for those who doesn't know me. I am a professor of computer science at Peer Zet University in Palestine. I have been working in natural language processing, ontology engineering and semantic web over 20 years by now. This is me. As most of my research is in the past 10 years is focused on building linguistic resources. And this is a picture of the beautiful campus of Peer Zet University. And by the way, Beer Zet in Arabic means a will of olive oil. Because the area is very famous with the high quality of olive oil. In addition to olive oil, also, we try to play an active role in building linguistic resources. The main linguistic resources we build so far are lexicographic database, which includes about 150 lexicals that we have been digitizing and cleaning and integrating over the past 10 years. We believe now it's the biggest lexicographic database for Arabic. And it includes the classical lexicons, thesori, glossaries by and tri-lingual dictionaries, and many others, and almost in all domains. So we are building the Arabic ontology, which is a formal Arabic webnet, which I will show you today. All concepts, by the way, in this ontology are mapped to Princeton webnet, to Dochi, to the BFO, and to the Wikidata knowledge graph, in addition to other lexicons. We also have several dialect corpora annotated with many morphological features linked with the other lexical resources. And the three together, the three types of resources together are forming big linguistic data graph, which is available over this link. Okay, so in this talk, I will discuss three things. First, I will discuss why do we need linguistic ontologies. Then I will discuss some foundational differences between webnet linguistic ontology and application ontology. And in the last or the third part of the talk, I will show you the Arabic ontology, which is a linguistic ontology. We know where these are widely used in natural language processing and information retrieval for enhancing the retrieval of unstructured information. But there are new types, new types of applications with new demands to use webnets as ontologies for managing structured data, like knowledge graphs, Wikidata, and other applications. In fact, I found about 5000 articles recently mentioning webnet and knowledge graphs. This is despite the fact that knowledge graphs actually are or is a new research area. So there is a big demand. As you see here, in this small knowledge graph. In Egypt, for example, is a person inventor of iPhone, a former CEO of Apple, etc. These green links are links between instances or relations between instances and mineral algebra. In these yellow boxes here, we have think, person, university, place, date, etc., which are types of entities. And which graphs share and use, not only one graph, maybe other graphs share the same thing. So the issue here is that people want to use webnet webnet instances as a classes of things. They want to use webnets to ontology classes. However, as we know, ontologies are typically rich axiomatizations and they are application specific. While webnets are general bedpost lexicals. And if we try to axiomatize webnets, they become really rigid. And the question that I will discuss with you today is how to build linguistic ontology to better serve a new types of applications. So, in this second part of the talk, I will show you the foundational differences between webnets and ontologies. I will first examine a sense it versus concept and what's the difference between them. As you see here, there's a small piece of webnet. And we know that webnet is made of senses and or a sense it is a linguistic concept or signifies a linguistic concept and the linguistic concept is a thought in our mind. And even in linguistics we know that even individuals are seen as concepts. While in the other side, an ontology is made of courses as well, but the notion of concept here means classes of individuals. This means that concepts that do not have individuals cannot be part of the ontology, but they can be part of it. So, since it's are not the same as concepts. However, in the Arabic ontology I define a concept in the intentional sense with intentional interpretation, following Nikola Guarino's definition of conceptualization, and which refers to the seat of maximal state of affairs, which I believe this definition is more suitable for knowledge of the subject. Second, let's examine hyponomy and subsumption. So hyponomy between two senses and webnet is added if native speakers accept a sentence like B is a kind of a while on ontology, it is a subset relationship. And of course because ontology concepts are sets of individuals, so we have two sets of individuals, and the subsumption means subset. Thus, for an ontology we say every instance for example of a table is also an instance of furniture in all possible state of affairs or all possible words. Let's see synonymy. So synonymy in webnet if we can substitute two words in a linguistic context by one with the other without altering the truth value, then the two words are synonyms. But if you see in the ontology, synonymy is actually not really defined. Synonymy or synonyms are generally seen as alternative levels or names of concepts. In the Arabic ontology, I define synonymy, synonymy relationship as equivalent class, and therefore it is reflexive, symmetric and transitive relation. In other words, if we have two concepts with different terms, these terms are considered synonyms if and only if both concepts have the same extension, the same set of individuals. So, what I want to say is that when building webnets, we think of linguistic context. Why, when we build ontologies, we think of the classes and relations between instances. And because of these different views, there are some consequences that I'll show you shortly. So now I will show you some of these consequences or issues that are from an ontology point of view might not be correct, but they might be acceptable if wordnet is seen as a lexicon. So this is an example of a relation in wordnet where we have no benefits in including it as it is implied relation and can be derived from other relations. So if we say that a reflate is an inflate and inflate is a change, we don't need to say that reflate is a change, because this is an implied and can be derived relation. This is another case where we have a morning star and evening star as two different senses or concepts in wordnet. This might be indeed different linguistic concepts, but ontologically it is the same instance that we will see at the same time at different time. So if you see me in the morning, and if you see me in the evening, I'm still the same individual. So it's Venus. So that we will see in the morning or in the evening. So should we consider the same instance or not from an ontological point of view, it is the same instance. While in wordnets, it's considered as different concept. This case is important, which is related to verbs and their verbal nouns. So verbs are linguistic rather than ontological categories. Look at this sense of the verb learn, which means gain knowledge or skills. The learning and this sense, the noun sense of learning refers to the cognitive process of acquiring the skills or knowledge. Actually both refer to the same event of learning. We say he learns, he learned, he's learning, the learning he gained, blah blah, we actually refer to the exact same event of learning. Ontologies capture events that verbs denote but not verbs themselves. And I will, as I will show you later in the Arabic ontology, it does not contain verbs at all. There are no verb senses. Instead, we added the verbal nouns that are linked with verbs, but linked at the language level, not at the, not within inside the ontology. Now, another case, another issue that is related to the accuracy of the content. So in this sense, imaginary and complex numbers are seen as synonyms. Why in mathematics, an imaginary number is only a special case of a complex number. So, actually, this brings us to think about how to benchmark the accuracy of what we have, of our content. Should it be according to speakers or according to mathematics. If it's to speakers, then how to know what speakers actually believe and whether what they believe is actually correct or not. Okay, so this is another case. I have another case related to accuracy, sorry. And how we should, the benchmark, we should benchmark the accuracy. In this case, I show you a sense that have Islamic calendar month that is a month. And the month here is defined as one of the 12 divisions of the calendar year. And the calendar year is a Gregorian year. And this is different from the lunar year. So, it's not really the content, the accuracy is not. We just change the list. Yeah. In this slide, I list the major differences between application ontologies and linguistic content. So what's the difference between both. So, and this, these differences are important to be aware when we build with the content is that can play the role of awareness. So I'm saying when I refer to linguistic ontologies, I mean that can play the role of awareness with which are different from application. So the application ontologies are typically rich maximizations while linguistic ontologies are to be should be lightweight otherwise they become too rigid for the language. Also, policy is not a problem in application ontologies because people can invent their labels to their classes as they like, while this is not possible for the linguistic ontologies and where it needs and policy is important to care about. Similarly, also synonymy is not a target for application ontologies, while it's important for linguistic. And more importantly is how knowledge is benchmark. So for application ontologies, the knowledge is benchmark to applications knowledge, because when we build application ontology, we have a certain application that will use it or a class of applications. While when we build a linguistic ontologies and where it needs. So this is this should be benchmark to general because we have we don't have application requirements at hand to think about when we build them. So they are for general purpose. Okay. Now, allow me to move to the third part of the talk to present you the Arabic ontology which is being developed at BRZ University. So, we can define the Arabic ontology as a lightweight formal specification of the concepts that the Arabic terms convey. In other words, it is a formal Arabic word net. So it's a word net. It's an Arabic word net. And all concepts in this ontology are mapped or linked with the senses in the Princeton word net. And with notes in the wiki data knowledge graph. Yes, we have mapping correspondences to our concepts and B of O and not she. And also with many of the elixic ones we have. Our progress so far. I really hate to talk about numbers. Because the numbers keep changing because we are working on them. We have about thousand five thousand eight hundred. Concepts that are fully done. Most of them at the table. Also we have about 17,000. Concept that are partially done. And that means that they are still subject to change. But we show them in the borderland ontology. But when we show them we show them with the secret orange lines here. This gray lines means it's among from the thousand eight hundred. As you see also in the ontology we have some English. Especially at the top level. But this is providing English is not our target actually. They are just for communication and readability. So the methodology we followed to build this ontology is to down and put up at the same time. Maybe it's called middle out. But we built first the top level concepts. Which took us a long time. Then we see then we search or we identify the concepts from the elixic ones we have. We linked them with the top level and we keep revising and changing. Maybe sometimes require a major change or rewrite. Until all is done and the ontology this is the link to the ontology. This is the depiction of only the top three levels of the ontology. But the full description can be found on this paper. Which appeared recently in the applied ontology journal. And which shows that these concepts how they are mapped to BFO and to dodgy. Let me switch to show you the ontology. This is our ontology portal. Which is a lexicographic search engine that contains the ontology the Arabic ontology and 150 lexicons. And by the way some of the lexicons are really big. So we can search directly or there is a link to the ontology tree. So if we search for example time and we see the results coming from the lexicons and the ontology. So these are coming from the lexicons. And this is the results from the ontology. If I go back to show you to browse the ontology for me. This is the root node which is entity and entities divided into objects occurrence dependent entities abstracts information. And we can expand to see other levels like object is a physical object or social object. Physical object is divided into several categories. And the one can keep expanding between. Just want to show you that this is the sense it. This is the sense it. The Arabic words and English words, which are divided by this bar. This is the English gloss, the Arabic gloss. There's an example here in Arabic. And this is the concept ID. This is what we call concept to profile. So if you click, you see this is this concept to profile. For, for object, which contains some formalisms, sometimes mappings to other. To extend the resources, identity criteria, exam, for instance, et cetera. And we have also relations sometimes with the physical object that has some relations or some bars with other concepts and so on. So this is like generally either we browse or you can by the way you can also search for example virus. And you get it directly in the ontology. So we have three types here. You get it directly from the in the ontology. Okay, let me go back to the, by the way, more about this is actually can be found on the about page. We have also some frequently asked questions, et cetera. Let me go back to the slides. Okay. Yeah. So, now, I just want to show you that there are some major questions and design issues that we faced at the beginning. So these are the basic questions where what semantics should the ontology capture and adhere to how to benchmark the correctness of the ontology content. There are some options and other questions also. For example, should consists in the ontology be defined or classified based on the Arabic speakers. Our speakers believe should we adopt a certain lexicon and formalize it so that the ontology adheres to this lexicon. Based on what the scientific literature exits like mathematics biology, et cetera. Should we build the ontology based on what we have the ontology builders believe so it will be like beets it's so this is what we decided. So our benchmarking methodology consists of three, three levels of references. These three levels. So we say, first, we try to adhere to scientific to scientific knowledge. For example, when we classify stars, planets, satellites, et cetera, then the correctness of our classification is benchmarked against advances in astronomy. When we classify organisms, we try to rely on biology, et cetera. For example, what is a virus? It's an organism in the Arabic ontology like other lexicles because science or biology does not classify it as does not consider it even as an organism. When defining, for example, types of states and government systems, which science does not define. We try to rely on how on law and political sciences in cases where neither science, no subject matter experts knowledge can help us. For example, when we classify types of hate types of love, et cetera. We try to investigate and rely on common sense knowledge that can be found in some good lexicals and maybe some appropriate literature. So three types of references. Now I will. Okay. So how do we formalize glosses. This is a gloss as you see. A gloss is used to define a concept in an informal but controlled way. First, a gloss should start with a super type that we are defined. For example, if we define the gloss object, so it should start with entity. See entity because the super type is entity. We say we have to define physical object, which it starts with an object. Okay, second, we list the most distinguishing characteristics that is specialized this concept from its super type, and also from concepts in December. And third, the distinguishing characteristics are written in the form of a sequence of propositions. Okay, so this is about the evaluation of the comprehensiveness of the ontology content, which was not an easy task. So we need to know how much is the current version of the ontology is able to top any concept. We are evaluating the ability of to be in concerts. Not how much concepts we have. So this is important. They are different. So to do this, we did an experiment by classifying 1,830 terms of concepts that we found in El Gergeni lexicon, which is an old lexicon and contains the most abstract notions and Arabic and many do. We excluded some because we didn't understand. So, we would like to know if each of these concepts can have a place in the order. So this means that ideally, each of the thousand 830 concepts is placed as equivalent to or as a leaf note. This is the results here, as shown in the figure. So the blue numbers are the number of concerts that are mapped as equivalent. So, for example, the two means that there are two concepts in El Gergeni definitions that are equivalent to object. Which is good, which is there is no problem. The green numbers, like 107 here means that we have 107 concepts that are under information entity, but we cannot place them under information object or information realization. This means that we need to extend or revise the this label under information entity where other concepts that can be placed. So if all green numbers. Forget the blue numbers because they are equivalents, which means we have them for the green numbers. If all of them are at the leaf nodes, then it would be better. Otherwise, if not, if not under the leaf nodes, it means that some revision is needed. I would tell we have about 90% of the concepts we used in the experiment are correctly placed on the ontology and 10% are not. Which means we need to work on and revise some of the limits. I will not spend time, I will not spend much time here. But I would just like to say that the data representation or the ontology model, the data model of the ontology is similar to where needs more. In addition, we included some the area era in which the sense is used the lexical type, whether it is a technical sense, classical classical Arabic modern standard Arabic the electric. This is about your design. This is an important issue which I believe lexical refers may not care about. And which is related to how concepts are identified. So from the Arabic ontology, each concept is given a globally unique URL, or we get Peter say you are either. Each semantic relations also given a URL. Okay, as I told you earlier, we have a huge lexicographic database, which includes many types of lexicons, these are glossaries, etc. And that we are currently trying to link each lexical concept in a lexicon with the ontology concepts. So we use this mapping framework that we developed. So the idea is that by mapping all lexicons with the ontology, it means that all lexicons will be semantically linked with each other. So far we have about 13,000 maps. Okay, the ontology and the lexicals we have are all represented using these two main standards. The W3C is best practices for publishing link data and the lemon model. Because following these two standards is not only important. It is important to be compatible with external resources and to be part of the linguistic link open data. But they are also important to enable linking with knowledge graphs, because many graphs will be using the ontology. Okay, to sum up, I would like to stress the following points. First, we have a new demand of using wordnets for managing structured data, like knowledge graphs. And this is an important area. There are some issues that we need to consider when reusing wordnets as linguistic ontologies. And the Arabic ontology is an example of a linguistic ontology that can play the role of a wordnet and an ontology at the center. Last but not least, I believe there are some standards that are important to follow in digital lexicography when developing wordnets. This is the end of my talk. Thank you very much for listening. I'm happy to receive your questions directly or later by email. Also, please don't hesitate to email me if you need the ontology or some resources. And about my slides, I will post them on my web page. Thank you very much.