 part of those and then what we want is you know that the predictions for those countries essentially show you the right the right ranking. But in the training data I mean potentially you have some sentences are saying over 60 million people right or or you know dozens of millions or something like this but but is the sort of the error the error driven by those kind of things more or are there also or is it more just misinterpretation of the training data like where you know a language test that just sucks in something completely wrong like millions of tourists visit Italy every year or 10 million tourists visit Italy every year and then it has 10 and there and that's getting into the average. Well you know so the funny thing is and I mean I'm gonna come to that in much more detail in a couple of minutes but is that even though even if we learn to predict numbers the numbers in the input text don't don't matter that much I mean we did not we did not actually try this but my my here's a hypothesis if you Tokyo the corpus that you start with and you just you know replace all the number expressions just by some kind of a placeholder just by or just delete right I would I would guess that the performance of the model doesn't doesn't change a lot so what it actually does is that it learns more typical properties of large countries for example versus typical properties of small countries that means that you have a ground truth that is that you've already told it this is the reason populations of Spain and Portugal and then it predicts Italy right it's supervised learning yes so we don't need essentially the it's supervised within each attribute so we need some instances for example countries with their values for each attribute but we do not then need you know like observations textual observations well essentially essentially because I mean I was going to come to this I think on the next slide afterwards with two kinds of attributes with numeric attributes and we have categorical attributes like for government which we value rise and we just wanted you know like one model that we could just know we just normalize them you know to we just you know center them center them and just you know you mentioned that you had specific examples for each attribute that wanted to predict and so you were predicting on textual samples that contained sentences like or any text okay maybe at this point or maybe just hold that thought and two or three slides I may go over to the to the board and you know just just put down like a structure of what the setup was okay so people in freebase are very interested in something like this right so I mean it's a collaborative it's a collaborative so no I mean if you have a collaborative resource like you can be a freebase something like this right I mean for example the country size you that's one of the first things that people list for a country right but I mean essentially it's as usual it's as if distribution so you have attributes that you have for every country but for almost all that could use you just have missing values for for many countries yes for many many many edit the types okay so we did with the learning model here which which we thought would want to do better because it is allow the one to jointly learn learn attributes but the results actually turned out not to be not to be better than for the simple just progression model we're still investigating why that why that is so okay pre-processing already mentioned this so we we normalize numeric attributes we we find the rise categorical attributes and we remove attributes that occur very infrequently and then and maybe this then is the next line is the one where I should then then just give you give you that year so essentially the way it looks like is that you have for example you can think of this as a as a simple table for example this is the population table from freebase okay and then you say okay here is here are my training examples so I treat them I treat them as seen okay and so here here are my my testing examples where I assume that I do not so this is here for example a country here is the value so here the values are unseen in the test set so what you do is you you use your you use your training examples to train your logistic regression model and these so the input here or the output rather of the logistic regression model is the value that we get from the population table and the input here is is essentially the vector that that we got from a from a large from a large corpus the Google news corpus where they collected some some billions of tokens of English news so this is a how many 1000 dimensional dimensional vector and you use this to predict the to predict the values and then essentially you take that model in order to make new predictions again what you do is for the input you get the you get the the distributional vector from the Google news corpus and you essentially reuse the the learned weights in order to predict predict the value okay essentially that's maximum likely code it's it's maximum likelihood yes yes okay so we looked on on a couple of of the main sense and I'm going to concentrate on a couple of those here so countries animals and employers and so in the evaluation we we for evaluation I mean we always now have to kind of distinguish between the numeric attributes and the binary attributes so for the numeric attributes as I said before we use a rank correlation coefficient so we are asking is the is the essentially the ranking that induce on the on the entities by the by the attribute is that correctly predicted by the by the model and for the binary attributes and we just predict accuracy and then I mean in terms of baseline that you should compare against for numeric attributes what you what you can do is I mean this is a very stupid baseline so there is a main feature value from the training set and in terms of the binary attributes you can do a majority class classification and so the upper bound would be a model that you know where you where you use way essentially give it the give it the free base vectors as the input so you already give it the we don't use the distributional vectors here as the input but really the the vectors from the attribute the actual referential attribute vectors from the from the knowledge base so in principle the model should be able to learn the task perfectly but but as a matter of fact it doesn't do that so so here are a couple of results if we first look at the at the binary attributes you see that the that the basis here is very high in particular for employers I mean because when we have when we have categorical categorical attributes with many values we get lots of binary attributes with all of all almost all of which are our faults and and so we have in the meantime done some work on changing the changing the coding of the features there but these are the kind of stable numbers that I wanted to show then our own model here improves over the baseline of course it doesn't quite quite reach the the upper bound which is here it's almost one I mean we do a logistic regression okay and so we are we in the case of binary attributes we interpret the output of the regression as a probability of the feature being true this means that that the majority class is false and it holds true for 46% of our labeling basically yes yes I mean that's a binary classification task to keep okay okay and if we move to the two generic features then then what you see here is that the baseline is lower I mean overall I guess it's a more difficult task but we do get we do get a substantial improvement here over