 over the baselines. But if you then actually look into the predictions for the individual attributes, you see that these numbers here are that the variation between different attributes is extremely high. So we have attributes which are learned extremely well and attributes which are not learned very well. OK. So what we did was to do the attribute analysis. And so you see, for example, that a couple of binary features that were learned very well. So essentially, what you're asking here is you're doing a classification task that asks the model to predict the continent of a country. Here, you're a member of organization. You're also asking for a multiple class classification with respect to organizations, home of government, and so on. So these are things that are learned relatively well. These are things that are not learned relatively well, places it's reported to and currently used. I don't think I'm coming back to these attributes here. Just a second. So let me just quickly give you the outcome of our analysis here. So essentially, the problem with currency use is that essentially every country has its own currency. So that's not something that comes out very well in the distribution of vectors because it's a very scattered information. And places it's reported to is a very fine-grained aspect of knowledge about the country that's also difficult to grasp. OK. Then I would move you with first point. It's a bad math, so this will be a bit philosophical. Essentially, in data, you will have, so here you're talking about which features it predicts the best one, guessing also that there are certain countries for which it predicts features better. So for example, the major ones usually everything is going better, and there are also certain countries which are better defined. For example, if I say, what's the population of Korea? Wait a minute. North Korea, South Korea, or both Koreas? And then I guess in your model of the world, you have the entities. So you have only South Korea and North Korea. But in the data, of course, it will be big sense. So I don't know the populations, but let's say one is 30 million. Each of the two Koreas are 30 million and together they're 60 million. Now, in the data, the data, there's a part of the data which are telling you it's 30, and there's a part which are telling you it's 60. And now, but if we could actually see this vector space, they're actually all over here, there's nothing telling you that it's 45. And so this, then what I'm getting at here is, do you have a sense of confidence? Because in this case, you should know that your confidence is low because you have these very strong signals, either 30 or 60. That's a good question which touches on a number of fundamental methodological points. And so one that you talk about, as you mentioned, is ambiguity. So in particular, often with name entities, we have kind of the precise way of stating things, right? And then there is kind of an informal way of referring to the same entity. But you also have it in the other way, right? So if you have one country, you can also refer to this in many different ways. So you can say the US, the US, and the United States, the United States of America, and so on and so forth. Now, the data that we work with here or rather the vectors that the Google News guys built essentially were a pretty strict mapping between the realization in language as a kind of multi-way expression and the free-based entities. So if free-based, for example, had, as a country, South Korea, then just occurrences of the string, South Korea would go into some kind of stricken. You were strict. You didn't do some sort of synonymization or something. Right. So you're strict. And that means you lose a lot of data, but at least you keep things second. But as a matter of fact, I mean, I'm going to come back to that in the second part of my talk if we get to this. Because we continue with personal names. You have that much more, even much more. So one of the examples I'm going to talk about in a couple of slides was Einstein. It turns out that we had a model that predicts essentially categories for people. For Einstein, it predicted, I think, what was that, a sedimentologist. And we were wondering why. And it turns out that Einstein had a son. And this guy was one of the leading theoreticians. You know, river flows in the southwest years or something like this. So it wasn't wrong. It was just the wrong person. And when you look at actual text, then what will often happen is that the text starts and gives a full name in the first occurrence, and then we'll just use an abbreviated form. So you could have, what's the population of Java? I think it's an island and also a programming one. So the question is, can you actually at training time say this sentence is actually not about this entity because it's so far off entering? Right. And this is what I mentioned in passing a couple of slides ago is that ambiguity is a fundamental problem in distributional semantics. But people have essentially, you know, they do joint training at this immigration, right? So you start building a distributional representation as you go through the text. And after a couple of sentences, you have enough evidence to then say, when you see something new, is it likely to refer to the same concept or is it likely to be a different concept, right? And it becomes kind of a clustering task. Right. And you hope you're right because if you're wrong, you'll really mess everything up. I mean, the problem is that you don't really know what the right number of clusters is, right? Because every word has a different number of senses and then even if you know in principle what the right number of senses is, typically they are hierarchical so they're hierarchical instructions. So it depends on how you, how you feel about it. How about fine grade, you wanna make those distinctions. Okay. I think I do have an answer. Okay, so with the numeric attributes, yeah, what we find is that geolocation works really well. This is something that people, other people have also found before. And then the EP, for example, also works very well. So again, just to remind you what this means here, essentially it means something like that there is a correlation coefficient of 0.9 between the countries ordered from north to south and from east to west according to the ground truth and according to the predation of the model. And similarly with all of these other attributes. And what's interesting here is observations like this one, for example, the GDP per capita is relatively easy to predict and the nominal GDP, so the absolute GDP is quite a bit more difficult to predict, which might be surprising at the first instance. Sorry, it's very surprising. But actually there is kind of a nice explanation and I'll come to that in a minute. And then we have like, they found it, so this is really literally the year when the particular country was established as a country and the religion percentage, so the budgets and the views of the percentage. This is hard to predict, okay? It's hard to predict. Yes, these are very hard to predict. They're wrong here, yes. Looks like it should be easier to predict the right side. What was the founding date of Armenia? Well, Armenia is hard. I mean, that's another problem here, right? I mean, some of these concepts don't have like a unique operationalization. I mean, I think France is in the fifth Republic now, so which one would be counted? You count the, whatever, but I mean, my intuition is that, I mean, this comes back to the slide that I had at the beginning, you know, about the different type of context. And if you do what we did and only look at the better words, contexts, then there is not a lot, then this is very, very difficult to get right because you have to pick a number essentially and there are so many numbers just in the context of any country that picking the right one and fitting rather, this is the right founding date, this is very, very, very hard, okay? Okay, but maybe let's start with some examples that actually work. So if you look at the geolocation data, for example, here, there are a couple of countries with the actual positions here and the predicted positions in blue, so you see these tend to be kind of close to one another, but then there are very, very interesting effects. So I mean, I think this also goes with something that people said before, for some countries, it's, some attributes are easy for others, they are hard. And so here are a couple of ones where the geolocation is quite bad. And the interesting thing is that they tend to cluster somewhere around here, okay? And then we looked at what all of these are and so we found that there's where all of these, stupid, sorry, all these small, small island states. And I would guess if you asked most people, they would not be able to tell you where those are, right? Where are the Cook Islands? So you could do like Microsoft and make a blog post that says your AI is better than humans. Yeah. And so finally enough, one way to look at this is that, I mean, the model seems to do the same thing that people also do, people say, I don't know. I mean, it's an island and what other islands do I know? I mean, maybe, you know, like the shells or Mauritius or something, there's probably somewhere in Indian Ocean and this is actually where all these guys are predicted. I mean, my argument loses a little bit of coherence by the fact that Mauritius itself is predicted to be in India, okay? But what we think happens here is essentially, as I said, you know, what the model essentially ends up doing, because what it does, if you remember, is it tries to predict here these values from the distributional behavior, which means if two vectors.