 Hey guys, it's MJ the student's act tree and in this video. I want to look at some of the reasons on why the polls were wrong We were led to believe that Trump only had a 15% chance of winning. Oh Boy, how wrong were we? So what I want to do in this video is look at the five reasons why the polls were wrong So here they are at a glance, but what I'm going to do in this video is I'm going to look at each of these a little bit more detail So the first one is this idea of a biased sample We want to go out and we want to find the information By saying, you know, you're an American who would you vote for? But when we introduce the polling mechanism What we're basically doing is we're saying you are an American who's being exposed to this poll and Who are you going to be voting for? So what we've done is we've added in this extra condition Which alters the probabilities So from a graphical point of view, what we can see is that you've got your population And only a certain subset of your population Is exposed or more exposed to the polling mechanism Now the classic case in statistics was a few years ago Where not everybody had a telephone only the rich used to have the telephones And they conducted surveys via telephone Then the data that they were getting was distorted towards what the rich people thought and so it wasn't accurate Now with this election the polling mechanism could have been as slight as the type of website people visited Certain media websites that asked the polls Might have had a slight bias or the users that normally come onto this website Might have had a slight bias towards one of the candidates And hence all that polling data that we got from these websites Is basically kind of worthless Next I want to talk about this idea of the shy voter This is something that also popped up a lot with the brexit vote And the shy voter is someone who who wants to vote for trump But doesn't want to tell his friends that he's voting for trump Because voting for trump was kind of like associating yourself with racism and you know all these terrible things So trump supporters used to get you know told oh you're stupid or you're racist all these type of things So they would become shy they wouldn't tell people that they supported trump And even though polls were supposed to be anonymous these people may not have believed it or thought You know what let me not risk it in case somebody finds out and you know they share this poll Result on my facebook page, you know, and then i'm terribly embarrassed So once again what we're seeing is the poll mechanism Is separating these groups of people in not the right way Which means the polls are not getting the right results and we're not getting the truth But that's the problem with the polls. They also made some terrible statistical mistakes One of those is extrapolation abuse So this is another classic statistical mistake Is is that it's impossible to ask everybody, you know, who you're going to vote for, you know, America is massive There's millions and almost hundreds of millions of people voting in this election So they can't go and poll every single person So what they do is they go and they just poll a certain sample So they just ask a few individuals And then so in this case what i've got you on the screen is they ask four people Three of them are democrat one of them is saying they're going to vote for trump And then they say hey guys look at this. We've got our little sample data. This is what it's saying Let's just simply extrapolate the results And this is a dumb thing to do specifically with america where there's so many people voting That even if you poll a couple of millions of people your sample size is still going to be too small to be statistically significant Also voters tend to clump together. We kind of saw that with the states A lot of people in certain areas voted for the same party And that's something else that the polls would need to try and get around before they can just simply extrapolate Now another statistical mistake that they made Was that they said okay, we're going to take this poll data And we're also going to take historical data and we're going to build a fancy model And we are going to predict who the next president is going to be based on what the polls data is saying What economic data is saying what this data what this data, you know, they get all this information Funnily enough one of the pieces of information that they look at Is the height of the candidate and there's a study that has shown that the taller candidate Tends to win Now if this is accurate if this is a cause, okay, we're seeing that it's a correlation But if this is a cause it means next time the democrats if they want to win They just need to go out and find the tallest human being that they can find Okay, so that's if this correlation is accurate Or this correlation is merely a coincidence and it is just by coincidence that The taller candidate seems to get elected more often Now what these these guys doing who are building these models and everything like that They were looking at so much historical data that just by sheer chance they were finding factors that were correlated with the results Their big mistake though was that assuming that these factors were there for the cause So if I don't know if the length of ties had has indicated, you know, for republicans Just by chance in the past and they go and they measure the ties of people today and they say oh Well, because it's this certain measurement it must favor, you know, the democrats now You know, these are some of the silly things that we see in some of these prediction models around the elections But now there are some smart people and they did make a little bit, uh, you know more intelligent models But they were also making errors And one of them is the normal distribution error Now the normal distribution is an interesting thing I mean I spoke about it in my video back in february Where I said that we keep making this assumption that the population is normally distributed between, you know, democrats and republicans But if it is skewed then that's gonna heavily help trump and that seems to be the case So we see that the normal distribution has failed us in politics already But what these guys are also doing is they're saying, okay, we understand that the polls are not perfect And we're going to introduce something known as an error term to our model But then they go and they assume that the error term is randomly distributed on a normal distribution Okay, now why do they why do they do this? Well, because their school stats textbook says so it used to be It works nicely with mathematics. It's very nice for the formula, you know, so we just assume that it's normally distributed However, if you do a university course in statistics or, you know, you learn a little bit more about the subject You see that in reality very rarely are the error terms normally distributed They tend to have more fat in their tails And funnily enough, this is the exact same mistake bankers and credit agencies made back in 2007 When they assumed that market movements would be normal and thus they underestimated that whole risk of that financial recession So I guess the lesson here is that a little bit of statistics is dangerous You know, Alexander Pope said a little bit of learning is dangerous And we're seeing that with stats people think that they understand the basics and that they can then just, you know Blindly apply extrapolation and correlation and cause when in fact, they don't actually understand What they're doing Now this is bad. Okay, and actually we need to be We need to be aware of this and we need to you know, when we build models when we use stats We need to be aware that there are these dangers in being overconfident And we need to think through each thing before we apply them Because what's happening now is statistics is a powerful subject, but it's getting a very bad name You know, you're getting this these statements like there's three types of lies You know lies damn lies and statistics But you know, that's not true. The fact is is that people are abusing stats You know, this isn't the first time and I can show tell you that it's not going to be the last time And that's what's happened Abusing stats, we've got the wrong results and we've made the wrong predictions And now everybody's in shock. It happened with brexit. It's happened with trump And gosh, I really don't know when it's going to happen next But anyway, thanks so much for watching and I'll see you for the rest of my videos. Cheers guys