 of whether a relationship is there or not, okay? And this is actually, this actually turned out to be true. So the precision of these guys is, I mean, not to give 100%, but at least, you know, like about 50%, but the recall share, so the amount, essentially the percentage of the item that we wanted them to make good predictions is relatively low here. So there is one itself, which has a relatively high one with all the others that are 10% lower. Whereas if you look here at the bottom, these are two distributional approaches. You see that they contribute a lot, a lot of the relationships that we're interested in, but the precision is actually just about half the precision of the ontological resources. So you can't really rely on the information that they give you. And so really recently, people have tried to build kind of a hybrid distributional symbolic inference system and they essentially had to throw, just a second, okay? Just to finish that sentence. They had to throw out the distributional predictions from the system again, because they just got too many false ones, yes. So my question is if this rule holds true, even if the data sets are just very, very large. So it's not due to lack of data, it's due to some other inherent property of tech. Yes, yes. And really the reason for that is that essentially the assumption that all of the words, you know, that are just around the target word that you're interested in, that they contribute to the same extent to the meaning of the word, this is just, you know, oversimplification. Okay. So I think kind of a relatively good way of characterizing the situation is that if you observe the distribution, you can capture cause-grained meaning aspects that not very well or more fine-grained more fine-grained meaning aspects. Which then brings me to using this machine here, right? Which brings me to the real topic of the talk here, namely entities or main entities. And so this has been something of an overlooked area in computational linguistics, because most work in classical semantics and modeling meaning is on what linguists call common nouns or concepts, but then there is also another ontological category namely entities, and these are of course those things that you find, for example, in Yago or in Freebase or in Wikipedia, essentially all of these, you know, collaboratively built knowledge bases that the people care about these days. Now if you think about them in formal semantic terms in terms of type theory and lambda calculus, for example, then you would say that, well, those are two fundamentally different things because the meaning of a concept like composer is a set of entities, okay? So it is a function from entities to truth values whereas the meaning, the semantics of an entity like Angello Mervil, for example, is really just an individual, okay? So from a formal point of view, they should be very different. Now what would we say from a data driven or from a critical point of view, how should they look like? And now the interesting starting point of our investigation was essentially that many entities seem to lead some kind of a double life in the sense that they do have fuzzy properties like concepts, so if you think that I actually started with an example that did use entities like Italy, okay, you just know that Italy is somehow coarsely or correlated with words like concepts like something in wine and beach and make Rome unable, but they also have a lot of precise attributes which I'm gonna call referential attributes because those really only make sense if you talk about entities that have a unique reference in the world and they look somewhat different, right? So they look something like this and this is the kind of representation that you would find for Italy in Wikipedia, for example. So Wikipedia would tell you that Italy has a population of 60 something million, an area of 300,000 square kilometers, language that's spoken is Italian, blah, blah, blah, and so on and so forth, okay? So it seems here that these entities really have those two different perspectives that come together and what we asked in our research was the overarching question of whether standard distribution approaches such as I sketched at the beginning 10 minutes ago are also able to model the meaning of entities in a similar way as they do for concepts and so what I'm gonna talk about is the first study where we model referential attributes of entities and now some really recent work where we then also model relationships between concepts and entities. But just in case everybody falls asleep at some point, the answer is yes with some qualifications so work surprisingly well. Okay. So just to kind of make it more concrete what we're trying to do, we're trying to take something like a distributional vector so it's only this way that a voice vector like this one and then try to use it to predict referential attributes of entities, okay? And now we actually come to a point which kind of I think links up well with the question that we just asked, it is okay? Now what is the role of context here? And so for example, it's really obvious that such referential attributes are learnable relatively easily, for example, by defining specific context pattern or using like open information extraction technology where people introduce something like regular expressions over partially linguistically analyzed sentences to extract tuples of related, rather than one really just linguistic phrases. So here's an example, if you want to predict the population attribute for Italy, of course, if you, for example, specify that you are looking for specific patterns like, you know, population of X is Y or Y has X inhabitants or over X people live in Y, something like this, if you specify such patterns, then of course it's mostly trivial to extract that information from corpora, I mean. And here I guess, you know, the answer is if your corpora are big enough then, you know, one of such patterns will match eventually because I mean, not every text that you might have in your core business, of course, in the encyclopedia text, but even normal people occasionally talk about, let's say the population of Italy. But this, sorry? I didn't quite get why exactly you needed the back of words representation of Italy concept to do that. Okay, so the answer really is, so what we do is, given the text, okay, we're trying to predict these properties here, right? And so essentially, we already know how to create those vectors here just by counting, and then the answer is, can we also jump over that gap in some way, okay? Okay, so as I said, if you define such detailed context patterns, then that gap is relatively easy to cross, but this is exactly not what we want, but our question is, are they learnable from a little bit of words distributional vectors, so from that kind of thing that you see down there. And so what we did is we said, okay, let's try and start with the easiest thing that we can come up with, which is a simple supervised learning setup, which then makes the task related to what's people in the semantic web community call knowledge-based completion, so you have a partial knowledge base, for example, you know the population for a couple of countries, and you want to then predict the population around the attributes for other countries, and that is right now a very active area, not the least because many companies like Facebook and Google and so on are highly interested in these kind of things. People have developed very sophisticated methods to look at this, and our approach was essentially to start with the simplest possible setup because we're always interested in not only making something that works with us, and understanding what works and what doesn't, so what we did was simply essentially to build a logistic regression model where, or rather a series of logistic regression models where we learn each output attribute, so for example, the population attribute of these countries as a function of the vector for that country, okay? So you just observe all the occurrences of Italy in your corpus, then the vector that with the concurrent frequencies that you get, these are your features, you put that in your regression model, and the output of the logistic regression model is then the population of Italy. Of course, we have to normalize the features for this, but that's just again a detail, and you should do regularization if you do this in order to keep the model the same. We also did experiment with a simple feed forward neural network with one big layer, because as I just said, what you do here is to build one model for each output attribute, so you build one model for population, one model for GDP, one model for area size, and so on, but of course, I mean these attributes are correlated, and you might believe that the model can actually profit from learning those models together, and this is essentially what a simple neural network with a layer would do, because it would essentially try and optimize it in layer representation so that it can jointly predict what the values of the output features are. So if I understand correctly, it's very likely I do not. It's a regression model, so potentially in the output column population of Italy could be, although it's 61, it might say 59.5, or something like that, and you would score it according to say, oh, it's only 5% off, or something like that. So this is actually the purpose? Yes, I mean I was gonna come to this, but I can already give you a quick preview here, so the evaluation is going to be correlation, okay? So if we have 200 countries that we'll train on.