 I felt games with tweets from a guest speaker who's in Singapore forward a couple of weeks. Yeah. So let's start with Abel, and then we'll have a second one with Lucas talking about... What was it? Quick chat. Quick chat, yes. The only one quick announcement is I won't be around until the end of the meetup. I have to rush off halfway through, but a friend of mine will be sort of helping to clear up after. So if you have any questions, feel free to ask Mike, or I have to be sorry, feel free to ask Mike, his member as well. Otherwise I'll see you guys next time. Thanks. Go ahead. So my name is Abel. I'm going to be talking about some work that some people did when they tried to predict the outcomes of NFL games using data from Twitter. My English is not that great. I mean, it's kind of accented. So if there is anything you don't understand, please don't be afraid to ask for repetition. If you don't understand, probably most people don't either, so they're putting them in favor. And I don't remember talking in public for like one year or two, so this might suck a lot. I hope it doesn't. This is a good time. Also, as I was reading the paper again on my way to Singapore, it kind of turned a bit into like a papers we hate presentation. They have that spread often actually. Hopefully it's still a good time for everyone. So first thing is, so NFL is an American football league. Here usually makes some kind of European smart comment, but I'm going to be nice. So this is a sport. Things you need to understand to kind of follow the rest. Basically there are two teams on every game. So one of the teams is the home team, which is the one that has the stadium. You know, they control the stadium or they play their health of the games. And the other one is the away team, who is visiting. So that's a good thing to know. And then the teams get points, whoever gets the most points wins. And it's a weekly schedule. So you have a bunch of games that are playing on like week one. And then you have another bunch of games that are played on week two and so on. So that's pretty much everything you need to know to understand what the analysis of these people did. And something that may be good to have in mind as well is that there is a ball. And if the ball goes in one direction, that means one team is like more likely to get points. And if it goes in the opposite direction, it's the other way around. So that's kind of good knowledge to have as well. That's it. So I forgot to say this, but if you have any questions, feel free to interrupt. I don't have the many slides. So it's not like it's going to go over time or anything if you ask questions. So feel free to do that. So now I'm going to talk a bit about the kind of predictions that these people make. Basically, at the end of the game, you have a score. And there are several kind of things that you can try to predict about the score. So the first one is the winner. So just which of the teams has more points in that score. So that's an easy one. Another one that's kind of like more subtle is a prediction about a 50-50 spread. So basically, some people don't like to make predictions about winners because I can get winners right most of the time if I just say that the home team is going to win. Because home teams win most of the games. Some people would rather make predictions about things that are 50-50 split. I think the reasons have to do with variance, but I don't really understand them. But it doesn't really matter. The point is that happens. So what happens instead is that the people that accept the predictions, which are usually like casinos or bookmakers or websites that people do for these things, they set a spread. What that means is that they see, okay, we have a team that's the favorite. And then they ask these questions. Using some internal model, what's the chance that the favorite is going to win by one point or more? What's the chance that the favorite is going to win by two points or more? What's the chance that the favorite is going to win by three points or more and so on and so on? And you keep asking the questions. Those chances get closer to 50%. So you pick the question that's closer to a 50-50 split. And that's the spread question. So basically making a prediction about the spread is giving an answer to a question that looks like that. Is the home team going to win by three points at least? That three is the spread. That's why people call the spread is the number that's associated with the question. So is this more or less clear or should I? So you're saying that that's the number. So three, if you see the spread of three, then three is the number where you don't know... It's a 50-50 chance. For one of the teams, yes. So saying that the favorite team will win by that number of points or more is a 50-50 proposition. And that's people call that the spread. And then the third one is over under, which is a prediction about the total number of points that are going to be scoring the game. So similarly speaking, there is a model that the bookmaker or casino or wherever has. And then they determine which one is the 50-50 point such that asking is there going to be more than 200 points is a 50-50 proposition. So those are the three kinds of predictions that we are considering. Now I'm going to talk a bit about the information that the people in this paper used to predict the games. So it's not just tweets. If you just use tweets, it's not that great. In many cases, you need to use other things. I don't know if they started doing that at the beginning of the paper or they did some experiments and realized that tweets are kind of shit sometimes. And they go, I don't know how it happened. But the point is they actually use quite a bit of other things. So the things that they use is the first one is just the frequency of words in tweets. So for each word, if that word appears on a lot of the tweets that people make about a game, the percentage of tweets in which that word occurs is something that they use. So basically, if everyone is tweeting about a team that's playing a game, the word injury, that probably means they're going to do worse. And if it's an offensive player, that might mean they're going to score less points. So it might be useful information for making a prediction on the over-under. So that's an example of how could that be useful. Something else that they use as kind of a smart is change in tweet volumes. So how much are people tweeting about the game? I guess you could say to give some intuition that maybe if the fans are not tweeting much about their team, that might mean they're not very motivated and the team might do worse than expected. So that might be a possible way of causation. Also, something that they use is the actual dose spread and over-under 50-50 margins we've been talking about before. An obvious way in which this is useful is if the spread is asking you, is the home team going to win by five points or more, then you can use that to make a prediction about the winner and say that the winner is going to be the home team because that's implicit by getting asked the question about the spread. And also, of course, they use some statistics of things performance like maybe they have been, maybe both of the teams have been scoring a lot of points lately. So maybe that means the over-under might be more likely to be over than under. So that kind of information. So a little bit more about the details of how they get that data. Often I think with the kind of project, the hardest part for actually doing it is actually getting the data. I wouldn't be surprised if they spent most of the time just getting the data and then they ran the analysis in like one hour. That wouldn't do performance. So the tweets they get from the Twitter garden hose API, I don't know if that's open or not anymore, but that gives you like 10% of the tweets that people make to associate the tweets with a particular team that's playing on a game to see which game it is, you just look at which week it is. So basically it's the next game that's going to happen. And then to see if it's one team or the other, they use hashtags. So if it has the hashtag for more than one team, they just ignore it. If it has hashtag that they manually see which hashtags correspond to which teams, then they associate it with the team that they hashtag. So this should be associated with. And pretty much everything else is from the website nflata.com. They also collect some information about the spread lines that the casinos are setting. That's where they take it from as well, from the website. So as I study something that they do that's kind of cool is they look at which words appear on tweets after games are played, depending on what the outcome of the game is. The coolest thing that they find is that when the VC team loses, people complain about their queries. They have the word refs. But when the home team is the one that loses, they don't complain about it's not so significant that they care about the refs. And that's actually kind of correlated with something I mentioned before, which is that home teams will win most of the games. People have studied why is this the case. And the consensus answer from several studies are under the impression that in most sports, home teams are the ones that win most of the games because their referees favor them. Subsconcially, you could say that it's just because the crowd is there and yelling you to some sort of something, you're more likely to do that. You don't want to go against the crowd. So it does make sense that when the VC team loses, they complain about the refs. It's more or less proven that the refs are going against them, that when the home team loses. So that's kind of interesting. It's a little section that they have in the paper. So now before you put that data that they collect in that way in the prediction model, there is some data massage that you have to do. So basically it's too much to have a feature in your model to have a variable for every word that appears in all tweets that anyone has made about that team. So they just take words that appear in a fixed percentage of the tweets. It's a little thing that they do. And something else that they need to do, kind of messy as well. But basically they do some predictions where they use as variables both those frequencies of words in the tweets and some information about the performances of the teams. I haven't looked at it, but I would expect that there are like tens of thousands or like thousands of variables corresponding to the frequencies of particular words and the performance of the teams that they are using like 20 words. So if you just mix those tens of thousands of variables with the frequency of the words with the ones about the team statistical performance, which are very few, basically your model is pretty much just going to ignore the ones about the performance of the teams. They're just not going to get drawn by the other 20,000 variables. So then what you need to do is to combine those 20,000 variables into a few that are in the same order of magnitude as the ones about the teams' skills. So basically what they do is that they consider word frequency for a particular game as a vector. So if the first word in some fixed order appears in 20% of the tweets and the second one appears in 10% of the tweets, there is going to be like 20, 10 something. And then they use a linear map to reduce that vector to a smaller dimensional vector. And then each of the components of that vector are the new variables. Is that more or less clear? Or should I repeat? And that's another trick that they need to use. And then something else that they do is that to look at those changes in volume of tweets that we've been talking about, you can look at those changes of absolute quantities, like there were 1,000 tweets less about this team this week, or you can look at it as a relative thing. There was a 5% decrease. So I don't think it's clear which one to use. In different situations. So the next thing to mention is which algorithm do they use to make those predictions. So it's logistic regression classifier. So basically that's what the logistic function is like. For every variable, that's more or less simulating a threshold function. And basically what your model training algorithm is doing is associating a weight with each variable and a threshold. And then if the value of the variable for a given game is about a threshold, you add that weight to your estimate of the outcomes. Of course it's not exactly that because it's a little bit more smooth, but that's basically what it's about. So basically in this case, the x-axis is any of the variables that we talk about. So it could be the frequency of a particular war in the tweets that are made about the game. So the frequency of the war attack in tweets about the game. And then the y-axis. So basically what you're doing here is you are fixing one of the outcomes of the prediction. So you're saying arbitrarily that you're trying to see what's the probability that the over and under is over. You could do it with under instead. And basically for each variable in your model, you have a curve like that. The y-value is how much probability you add to your estimate that the outcome is over based on that variable. So of course this could be symmetrical. This could, instead of going from 0 to 1, this could go from 1 to 0. And you have a weight for each variable. So it doesn't go from 0 to 1 because from 0 to the weight. But basically you're just going to have a curve like that for every variable. And then you're going to go, when you're making a prediction for the game for each variable, you will go to this graph. You're going to see what's the y-value corresponding to the value of that variable. Or you're going to add that to what you think are the chances that the outcome of the game will be over in the over and under case. Sorry? Yeah, that's the probability. You're basically trying to find out which variables are more important than the other ones. Yes, exactly because the variables that are more important are going to have a larger weight. So the variables that are more important when you do that process, the value that you're going to get is going to go from 0 to maybe like 0.1. So the curve will be shifted? Well, the curve is also going to be shifted but I'm talking about something different. I mean I'm talking about a vertical scale rather than a step. The size of the step. Yes. The step will go from 0 to 0.001 if it's a variable you don't care about because that variable doesn't impact your prediction. If it's a variable you care about, the curve will go from 0 to like 0.1 because it accounts for 10% of the interpreter of the answer. So do they run the model and then for the first time they run it, they don't adapt one variable and the second time they run it they check the other variable out. How does the learning process work? I live night one by one, the variables. I'm not sure how the learning process works. I'm not sure what's the best way to train this kind of machine learning algorithm. Something like that maybe what's going on. They should be throwing everything inside the model and then they might be training with the sigma function with the 1 divided by 1 plus e raised to minus beta 0 plus beta 0 x1 plus beta 2 x2 like that. You throw all your variables inside the model and then you train your logistic function with all the models and you get the corresponding weights. For the model itself it's basically through every scenario. So it will be covering for all the scenarios and then you take the budget of the function and then you try to predict the output which is from the 0 to 1. What they do also, what's the best thing to do to train the model depends on the amount of data that you have because you cannot afford to examine too many combinations. You have a lot of data because it will take too long time so you need to be smart. But I'm not sure exactly what. I don't think they implemented that themselves. I'm pretty sure they just used an off the shelf library for that. It's not on your net or something? No, it's not on your net. Probably there is some math that's related in some parts of it but it's not on your net. They just use registry reflection. They don't consider alternate. They do a lot of work on which variables do they throw into the model but they don't do much work into which model do they use. They just take this one. Have you tried other classic models? I don't think they don't do that in the paper. And do the papers are wide? These ones might be different questions. I assume it's kind of a standard. I mean I remember doing similar issues that before and I just used logistic classifier because that was the easiest. It cannot make sense. But do you think they throw a lot of care into which variables they put into the classifier? Yeah, I threw a table. Do the variables sort of effectively then manually select some variables to put into the classifier to see which gives the best output rather than taking a selection of variables and running it through between intelligence or statistical things to find out which work best? It's kind of a bit more brute force, is that what you're saying? They do it kind of by hand. Well, they do it kind of by hand but not quite because there's actually another paper that they cite where by brute force they are using an algorithm. So there are three kind of, actually they just classify variables into two kinds of variables. There is the Twitter variables which are very few and they are very few. You have one for each word but you have the Twitter variables which is the one that this paper is focusing on. Those, basically there are three cases. One case is when they consider the Twitter frequency of the words and they don't consider anything else. Another one is when they consider the Twitter frequency of the words and they do that linear reduction and then they throw some features of the performance on the teams. And then I think with the rate of tweets they also consider it by itself and combining with some features of the performance of the teams. Those features of the performance of the teams, they try several options but the combinations of options that they try are taken from a paper which itself has an algorithm to look at which combinations of variables are best. So it's not completely arbitrary their choices of variables. They took it from some paper that has an algorithm where they try to be smart about looking for sets of variables corresponding to the teams performances that can be used to predict. Does that answer the question? I think so. So the way the predictions are made is in an online fashion. So basically they are starting week four of the 2012 season because the first weeks of the season are thought to be kind of random. This kind of weird stuff going on in the first week. So they start in week number four and then they train the model for each week using data from the two previous seasons. And this current season when they read the paper which is used as a 12 up to three weeks before. So the value of the input variables as well as the output of the games for all of those games are what they use to train their prediction model. And also the prediction model has some parameters that are given to the training algorithm. So they just brute force was the best value for those parameters such that the algorithm makes good predictions on the previous two weeks. That seems kind of a hock. I mean it seems reasonable your kind of train to adapt to local trends. Maybe you should choose it to maximize performance in the whole set. I don't know. But that's the way they train their prediction algorithm. Any question? So then I'm going to discuss the results that they get. So basically if we look at the predictions that they make about who's going to be the winner. Many of them don't look very impressive because as I said if you predict the whole thing is going to win you're going to get it right 57% of the time. So if Twitter Unigrams are giving you 52% or if Twitter Unigrams combine with the F whatever represent the information about the performance of the teams except F1 and F2 that represent the spread and over on their lines. So if you combine that information with the performance of the teams F something with the Twitter world frequency as one variable and you get 47.6%. That's not very nice. So I mean I can understand where they use the Twitter rate variable which does affect that. So something else that also as you can see you can use a point in the spread line to see who is going to win. We're going to win the favorite. So the point in the spread line is that is a 50-50 proposition that a team will win by five or more. Well that thing is going to win most likely and you get that right 60% of the time. So that's kind of cool to see. That's F1 at the very top. And then there is something that's kind of annoying me which is they don't put any noise levels on this. So let's forget for the rest of talking about this slide, about the winner column. WTS by the way is with the spread. So it is the 50-50 spread prediction we've been talking about. So basically here what we have is 32 predictions about events that are more or less 50-50. You have 16 algorithms and you're making predictions for two things each. So you have 32 series of doing predictions. So I think something that you should consider is let's assume that our algorithms didn't have any prediction power at all. Our algorithm just like through a coin. And then we have 32 runs of that algorithm that has no prediction power. Then what's the chance that one of the runs by luck actually gets a bunch of gains, right? So I made those numbers and hopefully they're right. So basically I think you're expecting to achieve 53.6% in 5.5 of those runs. Even if your algorithm has no prediction power. And we can see here how many get 53.6%. It's like 1, 2, 3, 4 and 5. So maybe I missed one but like it's kind of disappointing. I like the idea of using Twitter on one note but it would be cool if they kind of acknowledged that. And also how many are expected to achieve 57.2% is like 0.8. And there are two that do it. So that's a little bit more optimistic. Something that stands out is that the best number tend to be the ones that use the Twitter rate, the rate of tweets. Combine with some features about the team's performance. So that's just six predictions. So the success that we see that 58.2 and the 57.2 just for making these predictions. It might be worth looking more into. So that one is a little bit better. But overall the results, I mean, again, at the 53.6 level, they're not doing better than random. So that's kind of sucks. So something else just kind of like winding down the presentation a few things. So first, if anyone is wondering, okay, I'm going to implement the Twitter rate thing. And maybe if I get lucky, I'll make money. So some things that might make that kind of hard. So one is the market just into information. This paper is not from yesterday. It has one or two years, maybe three. So the people that set those spread and over underlines, they read this paper too. So they might have taken that into account to the way they set the lines. Or maybe not. Maybe they have all the reasons that it's best for them to ignore this. But there's a chance that they took it into account. Second one is that these people are getting those spread and over underlines from NFLLate.com. Sure it's there, but that doesn't mean that that does a snapshot from a website. That's from a particular place at a particular time. Those might be the spread lines that a particular bookmaker that you might not be able to read that is giving just before the beginning of the day. If you're making a prediction, you're usually not making just before the beginning of the day. Or whatever point the NFL data took the spread lines from. So it might be a different mathematics for adjusting to the spread just before the game and for adjusting to the spreads one week before the game. And then of course there is lack. There's a lot of lack involved in this kind of stuff. We've seen in the previous case with the random coins. So one more thing. If this talk was interesting, I saw that there is a course in EDX, a related thing. In my suck, it might not. I intend to start it. I thought people might like it. This is the name of the paper, the address. I thought it would be cool to recommend this Twitter account. I think it's not about NFL, but it's about, well, it's about fan stuff, but it's also about Mark of Chains. They have, I don't know if anyone has seen it here, but basically they combine emails from recruiters for like software jobs. With, there is this other website, Collero with, I don't know if anyone has heard of it. It's a website where people talk of their experiences with like chemical compounds. And then they combine it. So they basically look for like ways that you can make sentences that make sense. Taking part of one, taking part of the sentence from a repeat and taking the other one from an email from a recruiter. And it's usually fun. I like it. I can put it on the screen as well. I can put it on the screen, so you don't have to look at the phone. So, I don't know. I think I like this one. They have a lot of them. I like this one too. It's kind of simple. The vast parking lot I was planning. Guy, I hope you're in the hospital for the next couple of days. I will reach out. That's kind of nice. Oh, I like this one. Codating a state of categorical anxiety and directionless fear until six seconds. Don't hesitate to reach out to see if you would like to set up a brief questionnaire and possibly linger in psychological difficulties. Oh, I like this one. Stand for alumni with a large amount of psychological preparation, extensive practice in meditation and identity control. And I'm a book. This is the top. This is probably the best of all time. Morning, holding ourselves to quell the inner pain. We are experiencing incredible pain. It's very nice. I'm actually very curious about why they use unicramps. Because I feel like... That's good. Because I feel like a lot of information is lost in human speech with words that negate meanings, for example. I mean, that's the very basic one. You know, sarcasm or... You probably get a lot more information there. And I think if you mind that, there's much more accuracy between that. And Sandinia was thinking also as pronouns. It's not the same as you say, we lost. Then you just say, you lost. It makes a big difference. They didn't really do any, like, sentiment analysis. Exactly. No, no. That's the next level already. How positive sentiment is. I mean, they are expecting that the sentiment will be inferred by the training algorithm. They will expect that the... I mean, it's reasonable, I guess, to assume that the training algorithm will detect that the war injury in the tweets of people that follow our team is correlated with the team, like, not winning the next game. But it's not strictly speaking sentiment analysis. But yeah, it's just an emergent thing that might happen. But they are not being very... It's not explicit at all. That might be interesting. Making money, I always feel like this is very interesting to do. But at the end of the day, if somebody is feeling about this, somebody else is thinking about it. That person probably also knows that this team is going to lose or can also expect that. Yeah, I mean, the thing with the sports predictions is that the mark, who do I say this? There are both rational... I know I'm not an expert in economics, but a way to phrase this that hopefully doesn't sound too stupid is there are both rational and irrational agents in this market. So you have people that are running these algorithms, and then you have a bunch of people that are making like bets at 3 a.m. drunk. And those people also have... They also can move the market. So that makes things a bit complicated. The expectation would be that Twitter would probably be quite closely aligned with the prediction of the market in terms of the betting market, because I believe most of the bookmakers adjust based on how many people are making bets on either side, which is basically... It's kind of like crowdsourcing the answer. And this is almost doing the same thing using Twitter. So I'd imagine that generally the correlation between crowdsourcing of bets and the crowdsourcing of Twitter. It has a higher degree of correlation, but a higher degree of correlation in likelihood of being wrong just as much as likelihood of being right. I think the bookmakers reacting to supply and demand in the buying and selling is possibly a lagging indicator. But I think they're saying this could be a leading indicator of the performance before the bookmakers maybe want to stay behind the leading indicator of the social media. I cannot tell you anything about this book because I'm just there and rich. She's like, why is it possible? They're just there on the basis of how much they're making bets. So it'd be interesting to see it. Well, that's why they're adjusting to minimize their risk. So it'd be interesting to see that actually plotted over time to see if you have a graph of where the market is for a particular event as it nears D-day versus this algorithm and how that plots it to see whether or not there is a time series lag or lead. How much was the amount of the tweets, the numbers? Was the number of tweets? Because it matters also if you have like 10,000 or 10 million tweets. Maybe on the paper. Maybe on the paper. Let's see if they have it. They use large quantities because that's good. Okay, no, it's here. So pre-game. I think weekly is the one we're looking at. Anyway, it's on the hundreds of thousands. I mean, you're dividing that between a few hundred games but I think it's still, I think it's still more or less significant. Any more questions? So about Mark's options. I think maybe you should check out PsyGen. There was the classical Mark's option of whether the MRT used automatically generated scientific papers and transmitted various kinds of things. And we found out, I think, that the suckers were the philosophy of PsyGen. Yeah. I've actually read the post-modern generator which is based on PsyGen. And it's actually quite believable. Yeah, and it was in charts as well. Charts and tables. Is that the Sokala fair one? Yeah. Is that the post-modern one? Yeah. So there were a few people a few years up, more recently than that, that actually got into some confer... I triple e-conferences. I mean, they were like very sketchy I triple e-conferences, but they got into them. It's just nobody read the papers. Yeah. What are the predictions? I'm assuming they were done before the game, and it wasn't like a bad test. They only use previous data. So I don't know if they did it every week or at the end of the season, but the predictions about a particular game ignore any data that happens in the future. I don't know if that's done because the prediction is actually made temporally speaking before the data exceeds or because they don't just take it into account, but it's done in an online fashion. It's not... I don't know. Was it all done in retrospect? Yeah. Even if it was not in a batch test, the prediction... it shouldn't matter, really. No, because you're ignoring the future data. If we look at what they do, either they did it every week or they simulated it, because basically for every... so the games happen week week one, a bunch of games, week two, a bunch of games, and so on. So what they do is, okay, let's try to predict the games from week five, and then let's train our algorithms using data up to... the previous week, up to what happened on week four. And then let's make our predictions. And then, okay, now it's week six. Let's train our model again using data up to week five, and then let's make our predictions for week six. If you do that at the end of the season or you do it in each week, it shouldn't matter because... If you're not using anything faster, the game... There are no more questions left. Thank you.