 Oké, bedankt, ik ga over een project aan het analyseren en de voorstelling van de uitgang van de elections. Dit is een project, een joint effort van CentroData, de datacollection en dataanalysis instituut hier op campus. En Tilburg University, Tilburg School of Economics Management en in particular de Data Science Institute. Mijn mensen hebben op dit project collaboraties. Hier hebben we 4 van de mensen gehoord. One of the main persons is Jochem, who is present and he did all the difficult work. Also be able to answer any difficult questions that you may have. I should also mention of course Marcel Das, who kind of kindly provided or organized funding for this project. And participated in the brainstorming and organizing the whole thing. Before, well if you have the questions, these questions you have already seen. I can skip them now. I'll come back to these questions later. We did much more than just predicting, forecasting the election outcome. We also looked at all kinds of relations to background variables etc. So that was emphasized by having these questions. And we'll get the answers to the questions towards the end of this presentation. So what's the nature of this election poll, what distinguishes it from many other election polls. Essentially we are following something that has been done in the United States to forecast presidential elections there in 2012 as well as in 2016. And the difference with many other polls is first of all the way in which we ask the question. Essentially we ask two questions and we ask them in the form of probabilities. We don't ask for which party do you intend to vote. No, we ask what is the probability that you will vote for party A, party B, party C etc. And then hopefully the probabilities that they give add up to 100%. Well they have to add up to 100% otherwise people get stuck in the questionnaire. So that's one thing, so that means we take account of the fact that people don't really know yet what they are going to vote. Many people don't give zero or only zeros and 100s, they also give numbers in between. En dat makes it possible to kind of take account of that in a precise way. The same thing we do for the probability that they are going to vote at all. So instead of just asking are you going to vote or are you not going to vote. Or asking whether they voted in the previous elections, which is what some of the polls tend to do. We ask them what is the probability that they will vote. En dat makes it possible for us to kind of wait the answers or the parties that they choose with this probability. So that we can take account of the probability of the fact that they will vote or not vote. The other important difference is that we don't use a so called convenient sample. We use what statistics Netherlands calls a true probability sample. That is our sample is based upon a random selection of all addresses in the Netherlands provided to us by statistics Netherlands. We in the list panel is a panel managed by center data and set up by center data that selects a number large number of Dutch households. Randomly from the Dutch population, there's no self selection into that panel. So that makes the quality of this panel as a representative panel of the Dutch population better than many other panels that are used for this kind of thing. I'll come back to some characteristics of the list panel in a few moments. Now if we look at what happened in the US and why we decided to also try this here. Here is a picture of the 2012 elections. You see here blue was the percentage voting for Obama. Red is the percentage voting for the other candidates whose name you may not remember. But this was Mitch Romney. And the gray area here you see it's after all we are statisticians. So we also have something like an uncertainty margin. The gray area is means that if the probabilities are in that area, then the difference is not statistically significant. You would expect maybe that probabilities are here are about 50%. But people can also vote for a third candidate or not vote at all. So that's why probabilities for the voting for the two main candidates don't add up to 100. Now if you look here at the end of the time period shortly before the American elections in 2012. Then the predicted difference between Obama and Romney was very close to what actually happened in the elections. So this gave people some confidence in this method. In 2016, well, maybe I shouldn't talk too much about this because it's kind of depressing, but essentially the same thing happened. Other than many other polls, this forecast predicted that Trump would win the popular vote. Now as you may know he didn't really win the popular vote, but he still won the elections. And in the media this meant that this poll became known as one of the few polls that got it right. So that again made this way of polling more popular. So we tried to apply this in the Netherlands. Now of course the main difference between the Netherlands and the US in this respect is that in the Dutch system we have many parties. I'm not going to go through the history of the Dutch electoral system because I suppose most of you are familiar with that. The main thing is that we have many parties and it's not just a choice between essentially two candidates. It's a choice between very many candidates and if we're talking about entering parliament, then the threshold for entering parliament is essentially the number of votes you need to get one seat, one out of the 150 seats. So that's 0.67% of all the votes. I'm not going to talk about the main results. Let me skip that for the moment. Let me now talk about the data. As I said, the list panel is ministered by Central Data. In total it contains about 8000 individuals in 5000 households. We don't use the complete panel for this. We use about half the panel. In fact we have about 3000 individuals in every survey that we take. It's a random sample from the population. You cannot self-select into the sample. It's carried out over the internet. Now in the Netherlands internet is very common but still there are people who don't have internet. If they don't have internet or if they don't have a computer, we provide them or the list panel provides them with necessary tools, with a simple computer and access to the internet. So coverage is not limited to the population with internet access. We started with this in January, January 18, and we stopped the day before the elections. I'll talk about how the schedule of interviewing works in a minute. Before we did this, we also had kind of a pilot in December and January, using a single cross-section of people. And there we kind of did some randomizations to test how things worked. For example to check for order effects and that kind of thing. I'm not going to discuss these things today. I'm just going to talk about the main survey. Now this schedule here at this graph at the bottom tells us how it works. Every respondent that participates in this study is interviewed once a week. That is once a week there's an invitation to answer the questions on voting behavior. Part of the panel that we use is divided into seven parts. In total we have something like 3500 people and every day 500 of them get an invitation. And then they have one week to answer the questions. And then after that week they get a new invitation for the next week. So they can answer the questions once a week. So it's a rolling window of seven days. Now this matters because if I'm going to show results later, we'll always use results based upon the seven past days. So that means we are a bit slow in finding responses to things that happened. Dramatic events that led to immediate changes in people's voting behavior. It takes about seven days before we see them completely in our panel. Of course the panel is not perfect. Some people don't participate in this kind of survey. So we correct for that using weights. Something most panels do weights based upon basic characteristics. So a nice thing of the list panel is that essentially we know a lot of background information. We have a lot of background information on all these people. We have their age education income. We know if they own their home or rent us, et cetera, et cetera. We know if they have a migration background or not, et cetera. We also know how these variables are distributed in the complete population of Dutch adults. So we can kind of correct for that using weights. In the results that I will present for voting behavior, of course we'll also wait for the probability that people go and vote. So if somebody says that he or she is not going to vote with probability 100%, then this person gets weights zero in the probabilities for voting for a certain party. Now this is a picture. Actually this is the final forecast for all the parties that we have. Now, as you know, probably there were 28 parties. We didn't account for all of them. There's something in the Netherlands called peilingwijzer, which basically is a summary or a mean of the six main election polls. Now at the beginning in January, peilingwijzer used, I think, 14 parties. The 12 that were in parliament and then VNL and Denk. Later on, they added FVD, Forum voor Democratie. So if you have very good eyes, you can see that at the very end, we also added Forum voor Democratie. It's kind of entering right here, because they kind of had a tendency to grow and to get more than one seat exceeded threshold for getting into parliament. So we decided to follow peilingwijzer and add them. Moreover, they already threatened us to sue us. So that was an additional reason for us to add them later. Now maybe you find this graph difficult to read. So let me for a moment go to the real thing. You see here, there's nothing. This is the website of this study. If I take all parties, then you see the same graph that you saw before. But now you can also see exactly what's going on. So this is VVD on March 13. You can see exactly how many seats are predicted here. And you can also see the confidence interval. Again, that's the kind of shaded or the colored area giving us the precision of these things. If you think this is too difficult, you can also just look at a few parties, like the five largest parties. Or you can just select a few. For example, if I first started no parties and then just take the labor party, you see the kind of sad thing happening to the labor party here. It's falling and falling. If you look at the populist party, it's falling a little bit. If you look at VVD, you see that one thing that distinguishes our poll from many other polls is that already when we started here, at this point in time in January, many polls predicted that the PVV would be the largest party. This was not the case in our poll. We have always had VVD as the largest party. Actually, the changes that we have over time seem to be somewhat smaller than what the other polls did. In addition, we give, on every day, we gave the current forecast of the number of seats before rounding with a confidence interval and the number of seats after rounding and distributing the so-called rest-zettles, the remaining seats with some specific algorithm that's used for that, so that everybody got an integer number of seats. This is kind of our final forecast on March 14. The website or the sheet. This one? Yes. Well, that's of course the question everybody is now asking. I'll come back to that at the end, but it's not perfect. Ja, ja, sure. The question is how successful is this. I'll come back to it. Let me keep you in suspense. But you already rightly remarked that there are some deviations between our forecast and the actual outcome of the elections. Yes. Ja, so the things that happened in the very last few days, I don't think we captured them well. I'll show you. Let me go back to the control L. Okay, it's the same picture. This is just enlarged picture of the two biggest parties, but I already said we kind of continuously forecasted that VVD would be larger than PVV. Now this is not comparing to the final outcome yet, but this is comparing to this piling weisel. So this mean of the six main elections. And it's a bit complicated, but a colorful picture. What it does is it shows the absolute deviations between our forecast of the number of seats and the piling weisel's forecast of the number of seats. Ja, so you see, for example, this light blue one down here. That's... Well, then it's not so very interesting. Maybe we should look at VVD, VVD, I think it's this one. So there, actually, we are very much in line with the average of the six, which includes us. So it kind of brings it down with a little bit. But that's kind of a good. But where we have strong deviations, for example, is here. The top one, that's SP. There we actually have the biggest average deviation. Well, that was also at the very end, as you already saw. For PVV, we started with a very big deviation because in our poll they were much smaller than in the average poll. But later on, this kind of converged. And at the very end, we had almost the same as piling weisel. For CDA, I think this one is... I think this is CDA, no? The deviation is kind of also pretty large. Now, one of the things we also have is this probability that people will vote. So an interesting thing we can do is check if voting, going and vote at all depends on party affiliation. It's also something that came in the news. It's something that would be very important. Even people said that that would determine the result of the election. Now, here you see that voting behaviour for the main parties, the biggest parties, is basically almost the same. The intention to go and vote is almost the same. It's above 80% for VVD, PVV and D66. And also for the other main parties on the right-hand side. For the smaller parties, it tends to be a little bit smaller here. Well, actually, so here you see we forecasted a pretty high turnout. And the turnout was very high, although maybe not as high as this. So we are a bit too high, particularly towards the end. You see also it increases a bit over time. People get more... The tendency to go and vote becomes a bit larger. No, no, no. What we did do is in one of the experiments, so here the question is what's the probability that you will go and vote. We also asked in an experiment what's the probability that you will not vote. And then you do get answers that are on average like 6 percentage points lower. So that's something with well people in the literature call that yes, saying bias or something like that. So that kind of thing matters. So maybe next time if we do this again, maybe we ask for the probability that people will not vote or randomize or something like that. So that explains part of the difference. So the main difference with other polls is that we ask these probabilities and people can indicate that they still attach positive probabilities to more than one party. Now here you see how often that happens. So zero means they will certainly not vote for that party. That's the bottom bar in each of these vertical bars. And one is week one, seven is week seven. That's the end of our poll towards the time of the real elections. So you see that the zeros become more frequent. People get less and less uncertain about what they are going to vote. Now the zeros are the largest chunk here. So let me also show you the picture without the zeros. So just dividing the remaining probability mass over the complete bar. And then you also see that the 100s et cetera, the large probabilities to vote for a certain party, they also tend to increase. So people get again more and more certain of what they are going to vote. Now you also see that there's quite some difference between the parties. For example, VVD, there's a lot of people who are quite certain that they will vote VVD or PVV. It's much less so for some of the other parties. Now I come back to the quiz. Because now I'm going to give you the answers to the quiz. This I can make a bit larger. Can you read this or should I make it larger? It's okay. All right. So this is about the intention to go and vote at all, not for a specific party. And essentially this shows the associations between the probability that you go and vote and several background characteristics. So the first thing to see here, for example, is that men compared to the base group, women compared to the base group of men, there's no difference at all. Because this dot here is on the vertical bar, which is the base group. So there's no difference between men and women in their intention to vote. I think this already shows that some of the answers to the first and second quiz question are incorrect. What does matter is, as you see here, age. All the people compared to the benchmark, the youngest benchmark group, all the people have substantially higher probability to vote. The dot gives you the point estimate and the line gives you the uncertainty margin. You see also that education matters compared to primary education, the lowest education level. People with university education have a significantly and much higher probability to go and vote. So I think one of the answers was for the group with the highest probability to vote was older and university educated people. Other things matter less. And the second question is just the opposite. I hope you all had that correct. We'll see in a minute. Then we go to the parties. For which party will people vote? Now you can see the answer to this question three, I think. Well, high income and low income. Well, I think that was a rather easy question for the people who are familiar with Dutch politics at least. If you look at the high income people, compared to low income people, they mainly have a higher tendency to vote for the dark blue bar here, which is the, well, can I call them liberals? The VVD. And the main difference between men and women. I think that was question four. The main difference between men and women is what you see here. The gender dummy here is significant mainly for the PVV. PVV is much less popular among women than it is among men. For other parties, like VVD, the gender doesn't play a large role. Also here I have two other parties, the 66th and socialist party. Also gender, there's a bit of a gender effect, but not very much. Dan er were two more questions on education. Now here you see, for example, education. People with university education have a large tendency to vote for the green one here, which is D66. So D66 is one of them. There's one more party, which is very popular among people with university education. I think that's here in my backup slide somewhere. It's GL, or what is it, the green left, GroenLinks. They are also quite popular among people with the highest education level. That was also one of the questions. And the last question was about the older people, the oldest age group. Now you see already here, the 75 plus have a huge tendency to vote for the Christian Democrats. And you can guess yourself what is the other party is popular among the 65 plus. That's kind of here on this slide. It's the party which calls itself 50 plus. So these are the correct answers. So now you know the correct answers. In a minute we'll see who wins the prize and what the prize is. Okay, let me finally come to the main question that you're all asking. How did you do? In this graph, which was obviously produced only today, so it doesn't look very colorful yet, but I hope you kind of get a picture. So the bold black line Pw that is Pilingweiser, so that is the mean of all the forecasts of the six main election polls. En it's a mean, so it kind of takes away a lot of noise. That's the idea of having a mean. So that actually does pretty well compared to individual polls. And there's two criteria that we use here. Average absolute difference in the number of seats per party. Or, well, if you did statistics when you were here at Tilburg University, you know that we like to take squares and minimize sums of squares, et cetera. So here we look at squares. Square deviations as a measure of lack of predictive fit. In both cases, Pilingweiser gets more precise when time passes, so it gets better towards the date of the election. So that's Pilingweiser, that's kind of the benchmark. The other bars here are the individual polls and the kind of the bold grey one that's us, the list panel. And then you see that in the beginning, we did pretty well. How well depends a bit on which criteria you use. But somehow towards the end, we kind of lost it. Particularly towards the very end in the last few days before the election, we seem to have the quality of our predictions well exposed seems to have deteriorated substantially. And unfortunately this is now what everybody looks like, the most recent predictions just before the elections. So there, as the gentleman over there already said earlier, there we don't do that well. Nobody does it perfectly, but we also certainly don't do it better than the other agencies. So there's still something that we need to analyze of course before we can say why this is the case. Sure, I mean, well I didn't put a to-do list in the slides, but of course perfectly right that we have to analyze this. And one of the things that is going to happen today or tomorrow or one of the next few days is that we're going to ask these same people what they have actually voted. So then we can check whether that does correspond to what the Dutch population has voted. So then we know whether in that respect the panel is representative for the population. We don't need then to assume that the weights work. So that's kind of a, will give hopefully a partial answer to your question if they then appear to vote the same as the population, then that's good for the list panel as such. Good for Marcel, but then it means that the way we ask the questions apparently may lead to differences or they've really changed their minds in the last few days. Of course we have this problem in very many domains, but the answer to that, how much selectivity there is, it very much depends I think on the domain that you look at. What is the nature of the questions that you're asking. And I don't know if we have any information on this kind of thing about political, about voting intentions. I don't know. I think parties are interested in polls that tell them that they do well. Because that doing well helps to attract new voters. The winner kind of attracts new voters. So that's kind of easy, but I don't know if parties are interested, so maybe they are also interested in many polls. I think the media are getting critical of all these polls because they don't like the idea that different polls give different outcomes and some of the media have even decided to only pay attention to the piling visor as the average of all of them. This is possible. We don't take it into account here of course, but I mean in principle you can analyze here if a game today leads to another game tomorrow so you can look at the dynamics in sure.