 So it's my pleasure today to introduce our speaker today, Zalvin Sonja Akyan, College of Science and Engineering, Seminar, Distributional Modeling for Entities. The case for precision is our title of our talk, Sebastian Poggo is our guest, and so thank you, Sebastian, for joining us today. I'll give you a shortened version out of the description and then we'll obviously hear the real deal in a second. So entities, like Mozart, are ontologically distinct from concepts, like composer, distributional methods are very good for capturing fuzzy, graded meaning of concepts. So for example, Italy is more similar to Spain than to Germany, but comparatively little attention has been paid to entities which presumably call for a more precise representation. So here we're going to hear from a couple of studies that Sebastian's been involved with, and he's going to a lot more detail. Sebastian Poggo is a professor of computational linguistics at Stuttgart University in Germany. He studied in Saar Brüken. In linguistics, people are very stressful, because now I have to say all the words precisely. And Edinburgh, receiving his masters in 2002 in cognitive science and PhD in 2007 in computational linguistics. After a post-coct position at Stanford, he's professor of computational linguistics at Heidelberg in 2010 and now in Stuttgart in 2013. His core research areas concerned methods to learn, represent and process aspects of natural language meaning from and in text. So I'll leave it at that. Thank you very much, Sebastian. Thank you very much for having me, and thank you everybody for coming in the afternoon on a very nice early spring day here. The good thing about this nice introduction I just got is that you already can get around my first slide, which was supposed to introduce myself anyway, just to give you a rough idea of what we are, and kind of a research context we read in Stuttgart. We are one of these German universities where we actually have an institute that's kind of dedicated to natural language processing, but we do text-based research and speech-based research and kind of going into the phonetics thing, and we have dedicated bachelors. The master's program's there, and so my group is called Theoretical Computational Linguistics, although you shouldn't put too much weight on that particular name, because the other group, for example, is called Foundations of Computational Linguistics, and if you ask yourself, what's the difference between the foundations of something and the theory of something? Well, I mean, it's mostly, you know, two convenient things to keep to the department, allow everybody to do whatever they feel like doing. So there is a couple of people involved here, and so the guy who's workable for all the mostly today is the person here at the left-hand corner, Abhijit Kukta, who was a PhD student working on the stuff here, and then we're having some collaboration with Gemma Valeda from Barcelona, and Mark Baroni, who was at Ten Twins now, Facebook, Facebook AI research, and changing the world, I guess. Okay, so what's the background of the stuff that I'm going to be talking about today? So the kind of general idea that a substantial part of my own research builds on is what's generally called Distributional Semantics in Natural Language Processing. So that goes back to ideas really from structural linguistics, and if you want to really go back into history, then you end up in somebody's life. So still around the turn of the 19th, 20th century, but kind of, let's say, more substantial, let's say, initial moment was, I guess, with the kind of structuralists of the 50s and 60s like Perth and Harris, who said that, well, you can talk about the meaning of words by observing how that word is actually used in actual conversation. I mean, that may sound trivial to you, but actually, if you compare it to kind of the more theoretical strand of linguistics, it's actually a pretty revolutionary idea that you should essentially have, you know, dictate what words mean by how people use a language test to define what words mean, right? Now, I mean, these guys recognize essentially that, you know, you can only, in a, let's say, I mean, today we would say, practical and good coverage way about word meaning to actually look at the usage of words in communication can also trace that backwards out to Wittgenstein's philosophy of utilize as well. What is a game? It's very hard to get a definition of game. Everybody knows what a game is, and if two people, essentially, successfully use a word game in their conversation, then we can tell what that means. Okay. Now, from a computational linguistics or NLP point of view, the nice thing about distributional semantics or the distributional hypothesis is that it gives you a very simple and almost foolproof way of representing the meaning of words if you have just a big chunk of text and you don't even need particular linguistic analysis of it. So in the simplest case, what you do is you just, you know, take a big chunk of text, for example, Wikipedia or something like this, and then you build it, what's called the battle for words model, which means you just, you know, look up for all the occurrences of the words that you're interested in, you know, what other words do you see there, a couple of positions to the left or to the right? Thank you. So if you're interested, for example, in what, you know, the usage perspective has to say on the meaning of individual country names, then you can look up what, for example, context of Italy occurs and you see Italy occurs with Sunny, not so often with beer, Spain occurs also with Sunny and beer, with Germany's just the other way around, right? So there is not too much to Sunny and Germany, but far more beer. Okay. So what's nice about this, too, is that you can also give this kind of vector representation that we have here, if you look at the column vectors, which you can then see as a representation of the country names, you can also give them a geometrical interpretation. So the context words, the rows here in the matrix, you can think of them as dimensions of a high-dimensional space, and then essentially the meaning of the word is just one point in this high-dimensional space, and then you can talk, for example, about co-sense similarity or any other kind of similarity measure that you like in order to characterize which words are more similar and which words are less similar. And so here, for example, from this space you kind of see one glance at Italy and Spain are more similar to one another than they are in Germany. Okay. So to kind of come back, the idea is that if you can observe the occurrences of words in a text corpus, then this allows you to essentially transfer something about the underlying symmetric similarity that you can support you if that's what you really have to know. And so this is something that the quantum bit of convenient analysis in computational linguistic terms builds on because it does allow you, as I said, to build large-scale word representations in an unsupervised way. And it has been shown to capture many different aspects of word meaning. As I said, the whole vagueness, fuzziness issue, words have multiple meanings also you can try to capture this distributionally as a kind of a clustering task, for example, and so on and so forth. And so what's nice if you see computational linguistics mostly as a way of doing linguistics in a practical and a computational way, then this allows you to explain or to give data-driven accounts of many concepts that linguists hear about, like lexical relationships, like selection of preferences, like also even processing aspects, like, for example, speed and effect when you show people one word and then you show them a related word and recognize these words faster, this is something that you can also explain relatively well here. So people have done this, I would say, roughly from the 1980s, you know, where people started really also looking at large-scale corporate computational linguistics in the first place, up until relatively recently, really just by counting, and then kind of the Neel and Epper revolution for the long end a few years ago, then somebody presented what he thought was essentially the first deep learning model of distributional semantics, but then the idea is, well, you're not just counting and putting those accounts in a matrix, what you do is you learn an underlying model that then essentially, you know, generates the observed co-occurrence counts with a high degree or hopefully high degree of accuracy and on a mathematical level, that doesn't really change the context because you can think of this more or less as a kind of dimensionality reduction and that's about it. So I would say I want to mostly extract away from that for the purpose of this talk. So what are the problems then? This is not supposed to happen, this is what apparently would happen if you had imported PowerPoint presentation into Keynote, but here we go. So a lot of limitation of the distributional approach is that it tends to be kind of weaker on the precision side. So you learn a lot of stuff about a lot of words, but you can't really rely on all of this. So for example, distributional approaches find semantic relations when they are there, but this is just really formulated the wrong way around here. If a distributional approach tells you that two words are somatically related, then you can be fairly sure about this, but they miss lots of semantic relations. And this is something that different people have shown over the years. For example, there was a very nice study in 2009 where they did a comparative evaluation of knowledge sources to find inference relations between words. For example, a country is a location and a chair and a piece of furniture and a table is also a piece of furniture and a window is an opening and these kinds of hierarchical inference relationships. And so what they did was to look at various gravitational inference resources and without going into too much detail you can roughly say that these guys here at the top of the table are ontologies that really are mostly hand built and try to give you try to give you and try to mostly hand build and try to give you a very precise idea of