 All right, thank you. So today I'll be describing an ontology that's been developed for event understanding. This is part of a larger project with doctors Stephen Tratz, Claire Voss, Jeffrey Meijker, and others. So today I'll be focusing on natural language processing or NLP resources that support the machine learning of event semantics. And I'm focusing on events because events not only tell us what's happening, but their structures tell us who is doing what to whom, when, where, and why. Understanding events has been tackled in natural language processing through the development of corpora that are manually annotated with semantic role labels, a markup of who is doing what to whom. These corpora serve as training data for automatic semantic role labeling, and that in turn supports other NLP applications like information extraction, question answering, and summarization. But you might ask why we really need manually annotated semantic roles, especially in English. Can we do this automatically, given that in English the subject of the sentence is usually the doer and the direct object after the verb is usually the done too? Well, although this is the dominant pattern in English, we actually see a wide variety of exceptions, like the one seen in this comic. The waiter tells the two ladies, I'm sorry, we only serve men in this room. With the intended meaning, of course, that men are the recipient. But following the dominant pattern in English, we would assume that men are the direct object, as these ladies do, who respond, good, bring us too. So having semantic role labels, like recipient and theme, help our automatic systems to disambiguate these two possibilities. And while the value of semantic role labeling is agreed upon in natural language processing, the exact set of labels is the subject of much debate. So while one approach might call this a recipient, another might call it a benefactor, a destination, or even just the entity served. Despite the successes of automatic semantic role labeling, we see that existing SRL resources have a variety of limitations. First, they each differ in what events and event participants are grouped together as similar under the same label. They have disparate formats and annotation schemas which prevent us from unifying these resources. And we'd really like to unify these resources because this type of manually annotated training data is very expensive and time consuming to produce. And we see that systems that are only trained on one data set or another tend to suffer from data sparsity and domain adaptation problems. Finally, while these resources do tell us about which events are similar to one another, we find that they don't give us much information on other potentially mission relevant relationships between events, like causal and temporal relationships between events. So with our ontology, we'll work to address each of these limitations. This research spans an international group of advisors that specialize in NLP, Theoretical Linguistics, and ontologies. While I'm the technical lead in implementing the ontology, Dr. Martha Palmer of the University of Colorado Boulder is the academic lead. Dr. Palmer was my advisor at the University of Colorado Boulder, where I received my joint PhD in Linguistics and Cognitive Science at the end of 2014. So we see ARL's niche in this research as leveraging existing work but augmenting it by opening it up for analyst users in the loop. And I'll tell you a bit more about that later. And why am I in particular working on this project? Well, I started working as an annotator on an SRL resource over 10 years ago. And I continued to work on refining and expanding these resources throughout my time as a graduate student. So while I think that there are a wide variety of challenges that the event ontology can help to address within autonomous systems, the Army challenge that I'd like to focus on today is that our analysts are faced with overwhelming amounts of unstructured, noisy, but potentially very valuable information. So how do we provide technology to allow them to determine automatically what's happening? This supports the ERA gap on AI and machine learning and the following KCI's and CCE's. So as I mentioned, this research is part of a larger basic refresh project that's in its first year. The goal of this refresh project is to develop an end-to-end information extraction pipeline that's not only designed for analyst use but also allows for an analyst user in the loop. On the left side of this pipeline, the components are intended to help us extract knowledge from data. So this is very much in line with the goals of the SIIS pillar of our information science campaign. On the right side of this pipeline, we're looking at how we can present this knowledge to our analysts to support their decision-making. So this is very much in line with the goals of the HII pillar. In tomorrow's HII tab briefing, Dr. Steven Tratz, who's right back there, will be presenting this full pipeline in more detail. Today, I'll just be focusing on the event ontology component, which I call the rich event ontology or Rio. To give you a bit of an outline of where we're going, I'll start by telling you about the information that's in the ontology and our approach in building it. And then I'll go into a couple of research areas, the first of which aligns with SIIS. Can we align these disparate SRL resources using the ontology to provide for increased coverage of the events that are recognized within our pipeline? On the HII side, can we use the ontology and its relations, including temporal and causal relations, to allow our analysts to refine their queries about events? OK, so let's jump in. You see here the overall structure of Rio, which is actually a set of five ontologies that can be imported together into one larger ontology. In the center, you see the main reference ontology, which is implemented in the web ontology language or OWL. This is the conceptual backbone of our ontology, where the concepts are generic event types. They're related to one another not only through typical superclass subclass ISA relationships, but also temporally and causally using fine-grained combinations of before, during, after, and preconditions, causes, and results. Information that's not found in any existing independent SRL resources. Now, what makes our event ontology unique in comparison to some other event ontologies that exist is that we are using existing SRL resources as the inventories of terms that denote the concepts found in the ontology. Specifically, we're using FrameNet, VerbNet, the automatic content extraction, or ACE project, and its direct spin-off, the Entities, Relations, and Events, or ERE project. So our first step was to implement each of these resources, which were previously in disparate XML formats, into OWL format. And then we create a linking model for each model that specifies the relationship between the generic event types seen in the main reference ontology and the types of event classifications that are made in each individual resource. We selected these particular resources because each tells us about related events and event participants, but each does so with a different granularity or level and type of semantic information that's included. The differences in granularity can be seen quite clearly when we compare the semantic role labels that are used in the annotation schemas of each project. So ACE and its direct spin-off share the same 25 semantic role labels. VerbNet uses 40 role labels, while FrameNet uses over 10,000 fine-grained role labels like cook and food. So we have approaches to the same problem where one is assuming that only 25 roles are needed to label all of the participants of events, and another is assuming that over 10,000 are needed. Despite these differences, the ontology allows us to combine and compare this information. And I think perhaps more importantly, it allows us to combine the annotations that have been done for each of these independent resources into a larger, more diverse training corpus. But given the differences amongst each of these resources, manually aligning them by associating them with particular nodes in the reference ontology has not been trivial. And I think perhaps the only thing that's been more challenging is actually coming up with the event nodes and the ontological structure of events within the main reference ontology. And why is that so hard? Well, I invite you to take a moment and think about all of the events in the world. There are a wide variety of events. And events are very nebulous. It's hard to say what even a single event is. So while ontological approaches lend themselves very well to structuring the world of objects, it's not so clear how to implement this structure over events. Fortunately, we're not the first to think about this problem. So why not use an existing upper level ontology and integrate our SRL resources into one of those? For example, Dolce, Sumo, Psych, or BFO? Well, we found that there's an insurmountable gap between the approaches to event representation taken in these ontologies and the approaches to event representation taken in our SRL resources. Specifically, what we found is that these ontological approaches are focusing primarily on objects and giving us only very general representations of events to distinguish events from objects, primarily using spatial and temporal features. Our SRL resources, on the other hand, focus on distinguishing event types based on their participant slots and the types of words that tend to fill those participant slots. So our ontology uniquely bridges the gap between these existing ontological approaches and the SRL approaches, combining the benefits of ontologies with the power of event participant patterns that we know our NLP tools can detect. So how exactly did we bridge this gap? Well, the very upper level of Rio aligns with the descriptive ontology for linguistic and cognitive engineering. So like Dolce, our root node of entity splits into endurant and perturant. Well, what are those? Endurance are those entities for which at any given snapshot of time we can understand that entity as a complete concept. So for example, this horse and this rider are both endurance. Perturance, on the other hand, are those entities for which we can only understand a part if we're given a single snapshot in time. So for example, the sequence of this horse jumping over the fence is a perturant. In our ontology, perturance then split into state of perturance, like love and no, which are thought to be a single homogenous event throughout their initial, internal, and final sub-periods. And then event of perturance, which are thought to be a series of events or states and a series of transitions between them. For example, the event of closing a door. Now, while I think that this is very elegant, what we've captured here, that everything that happens in the world can be broken down into states, transitions between those states, and the acts that affect those transitions. This type of information still doesn't map well to our SRL resources. But as we delve deeper into the ontology and the daughters under event of perturant, we start to see the defining distinction in classes shift towards patterns of event participants, or what we can call the neo-Davidsonian style of representing events. Davidson originally proposed a style of representation like this. There exists an event, and that event has at least one participant. It was Parsons that put the neo in neo-Davidsonian when he proposed augmenting Davidson's original proposal with semantic role labels. So for example, a caused change of location event, like the player through the ball to his teammate, would have a neo-Davidsonian style of representation like this. There exists a throwing event, the agent of that event is the player, the theme is the ball, and the destination is the teammate. So now we've moved from ontological type distinctions in our classes based on spatial and temporal features to a neo-Davidsonian style of representation that maps very well to our semantic role labeling resources and we've bridged the gap between these two approaches. So let's turn to the current status of this project. As part of a basic refresh project in its first year, the ontology is still under development. But at this point, we've completed all of our OWL resource models with all of the events that are included in those resources, the event mentions that fall into a particular class, as well as all the participants associated with a class. Within our main reference ontology, we've provided coverage for all of the types of events that we find in our ACE and ERE resources. We've started with these because they're thought to be events of interest to the DoD community, for example, conflict and justice events. Across all of our ontologies, we now have over 16,000 unique event mentions. So that's our wide vocabulary of event denotations. I'd like to turn now to our two evaluation areas which relate to our goals for the ontology within the information extraction pipeline. So our first goal is to provide increased coverage of the number of events that are recognized in our pipeline. So you can see here our ontology as it's implemented in protégé and open source ontology editor. I'd like to focus for the moment on legal action type events. One daughter are arresting events. So these generic arresting events in our reference ontology are associated with each of our resource models through the relation has reference group. So arresting events have three reference groups, arrest jail in ERE and ACE, and the arrest frame in FrameNet. Within each of these reference groups, we find words that could potentially denote this type of event or have been marked up in past annotations as denoting this type of event. And we can call these triggers of the event type. Okay, so stepping back for a moment, I'd like you to consider all of these legal action type events. So all of the daughters here. If we had an IE system that was trained only on ACE data, which is very common because ACE is the benchmark data set used by NIST and evaluating information extraction systems. So within ACE data, we have 102 triggers marked up for these legal action type events. So we could recognize about 102 types of events. Excuse me. Then if we use the ontology to combine all of our resources, we quickly move up to 389 recognized triggers for the legal action type events. Now, while I think that this is promising, we have to see the impact that this would actually have within an information extraction pipeline. So we're doing just that. We're using the ontology to incrementally add to the training data in our information extraction pipeline. In this early stage in our research, we've retrained a system from our partners at RPI using just the ACE data to obtain these baseline numbers for precision recall and F-measure on event trigger identification. At this very moment, we are working to retrain this pipeline using the combined ACE and ERE data. And I would have loved to give you those numbers today, but we've run into some stumbling blocks given small differences in the two data sets and the somewhat brittle nature of this pipeline, a problem that we're working to address more generally within our refresh project. Nonetheless, you can see the qualitative difference that combining just the ACE and ERE data makes by examining some of our test set data, which are open source documents relating to Kosovo that were selected by our analysts. You can see quite clearly a greater number of our event triggers shown in red on the right in the combined ACE plus ERE system. So turning now to our other evaluation area, looking at how the ontology can help users to refine their queries about events. In our future work, we'll be exploring this in user studies where we compare analysts' decision-making efficiency with and without the event ontology in our information extraction pipeline. Until this is complete, I'll just give you a sample use case today. So for a particular domain in the ontology, for example, the conflict and protest domain, we're using domain expertise to establish the typical temporal and causal relations between events and sub-events that form a larger scenario. In the case of protest, we used social science literature to establish the fact that firstly, communication is a defining sub-event of protest. Protest by nature involves the communication of some calls for change. And taking sides is a precondition of protest. But we also found that group identity is a precondition of taking sides. But group identity alone is generally not enough. Generally, we also see some type of grievance or trigger for intergroup conflict. So if our analysts are interested in querying not just for protest events, but for events that might indicate a protest to come, they can query for the preconditions of protest. For example, taking sides and its associated triggers like endorse or oppose. So in this way, the ontology allows for users to query for events that are related to other events in explicit higher order scenarios. So to wrap up, we've seen that Rio allows us to combine resources. And it also brings added value to those resources in the form of ontological structure. This allows users to refine or expand or alter their search queries using relations like temporal and causal relations. We've demonstrated that the ontology facilitates the combination of training data to improve the coverage of event recognition. And we hope that Rio will become a reference hub for the use and reuse of these SRL resources. And I do think that there's much interest in using these resources in NLP. Given that recently we've seen a lot of promising approaches that combined these types of knowledge-based resources with distributional approaches like Word2Vec. For our Army Warfighter, we're leveraging existing resources to deliver state-of-the-art event recognition in an IE pipeline designed specifically for analyst use. In our path forward, we'll continue the steps of evaluation that I outlined for the ontology within the IE pipeline. And we'll also be expanding the ontology firstly by integrating the object portion of the sumo ontology, and then expanding event type coverage given our analyst's input. We're gonna start by looking at cyber attack events. In collaboration with folks in sensing and detecting, we're also evaluating the utility of the ontology in human activity recognition. Exploring whether or not there's a level and type of event representation that's suitable for both text and video. Finally, we'll be moving into other languages where surely some of the ontological distinctions that we've made will need to be re-evaluated given that we were focusing primarily on English. We're considering incorporating Arabic verb net first. Thank you.