 Welcome back to the final installment of my 2018 All-Star Break demonstration series, looking at the ability to predict the winners and losers of baseball games. Today is the final day of the week. It's Friday, kind of still All-Star Break. The Cubs and Cardinals, I think, played a make-up game last night, so there's been one game so far, but the season really kicks off this afternoon and this evening. So, again, what I want to do in this module is to take the results of our models and then run it back through all of the betting line data to see if we'd have made those picks. If we, to say, invest $100 or bet $100, not a very good investment probably. If we bet $100 on every game using the favorites as predicted by our different models, how much money would we make or lose based on the different models that we use? So, at the end of the last module, one of the things I noticed is that in doing my commit and save and everything, that thing's got a little bit of out of whack and I accidentally lost some code. So, it's always critical that you save your file before you commit it. And so, one of the ways that you know in our studio that things haven't been saved is that the file name up here in the tab will be red. So, if you then save it, it goes back to black. Okay. So, I always want to make sure that that's black when you commit your code. The other thing that I commented on in the last tutorial, in the last demo, was that the names had been flipped coming out of the betting site. That Team 1 and Team 2, Team 1 was the visiting team and Team 2 was the home team for all of those columns. And so, I went, and with everything else we did, Team 1 was the home team and Team 2 was the visiting team. So, I went back through and flipped the names of the teams, the scores and the probabilities that I had pulled from the betting line website. And then I went back through and regenerated that file that we pulled down from the betting website. The other thing I noticed is that I started getting a lot of time-outs in accessing that website with the RVEST redHTML files, redHTML functions. And I suspect their traffic picked up since the baseball game started. So, I've committed the betting data to this repository. So, if you're following along, you should be able to get that as well. But if someone's watching this in the future during the baseball season, it might be a little bit difficult to regenerate that. There's probably a better way to set up those reads so that if you get that 504 error, it'll keep trying again. But we got it to work and we'll be good to go. So, all that now is committed and part of the branch that was on this, as part of this issue for validating the betting line module. So, there are a couple of other things that I've noticed is that as I was going through the code in the last tutorial, I inadvertently would switch back and forth between things like win probability, betting line, money line. Well, it's a money line model. And so, I want to make sure it refers to the money line model. So, we need to replace win probability model or betting line model with money line. So, that should hopefully be simple. So, that's one thing I need, a bit of housekeeping we needed to do. Another bit of housekeeping is how do we handle double headers. And so, the way we've been handling this was that we would do all our joins by the date, the home team, the visiting team, and the scores. Now, looking back through the data as I was late awake last night worrying about this and realized that there are some double headers where they have the same teams, of course, and the same score. And so, when we do those joins, we could get some funkiness with the joins. So, we want to add a number, game number, that indicates which game of the day was played. So, hopefully that won't be too hard. And then, we want to create another issue which will be to run predictions through betting data. And so, for here, we need to do a couple things. So, we need to convert the money line to a payout. So, if you give $100, if you bet $100, how much money are you going to get back if you bet on a certain team? We then need to join payouts with model predictions. And then, we need to get the annual payouts for each model. And then, I want to plot the cumulative winnings, losings for each model over the course of, or since, I think it was 2009, that we're able to get the data back. So, we'll go ahead and submit this. So, we've got three issues that we need to tackle today. I'm going to start with the first one of make sure that the code refers to money line model. And so, we'll come back and we'll do get checkout money line, checkout money line fix. Sorry, this needs to be a dash B. Excellent. So, we need to go through here. And this mainly picked up with what we're looking at in the last tutorial. And so, if we come down here, so there's money prob, favorite money win, money prob, that's good. Money line, so we have win prob, or this is the name of the data frame. So, we can get rid of that. So, I think what I'll do is I'll go ahead and highlight all of this and we're running this in R. Excellent. And so, this is the favorite win prob. We really want the money prob. So, let's look for favorite win prob. So, that's favorite win prob. So, favorite win prob, right, was the data with all of the other predictions. So, that's right. So, let's go through this and see what we've got. So, we're going to do all this. And so, if we get to the end of running this, we see we have favorite money one, favorite money prob. Okay, so that's good. If we save this and run this, and now we look at favorite win prob. We've got the money lines. That's great. We now want to make it tidy. So, we had win prob here. We really want money. I'm going to call it the money model. And so, that's good. And so, we need to change win prob here. So, let me look for win prob. And so, here we have money, money, and then let's search for win prob. So, we want money line. And so, this should also be money line. So, if we now run all this jazz, we see we have the money line model, which is great. And then we also have the comparison of the predicted versus the observed, where we again have the money line, okay? So, my colors here aren't very contrasting or strong, but I think we got the point that the three models do a pretty similar job to each other, at least between, say, 0.5 and 0.75. Excellent. So, we made that change. Not a big deal. We'll add this. We'll commit it. And we'll say, correct win prob to closes number four. So, we want to double check that that was issue four. Yep, issue four. And also, I want to double check that I've got this saved. So, get checkout master, get merge, money line fix. That's done. So, we're going to push, and we look at this, and we've closed the commit. Excellent. So, now we push on to handling double headers. So, I think this is going to get a little bit harder. So, if we come back to the top here, and we look at game data, we'll go to our console. We're going to minimize this. And we do game data. Just move it up. Why don't we try a game data group by date, and team one, team two, and then I'm going to do summarize n equals n parentheses. So, this then counts the number of games that were played between these two teams on these dates. There's 200,000 of these. So, I'm going to then arrange in descending order by n. And so, we see that way back in 1890, there were some triple headers. That's pretty intense. And so, again, what's happening is here, we take the game data, data frame, we're grouping all those rows by the date and the teams. So, if there's a date and there's the two teams are the same, then that's kind of going to become its own separate data frame. And then what we do is that with each of those, we then summarize by counting the number of rows in that data frame, and then we're going to arrange in descending order. If we want to, we could also add an ungroup after that summarize. So, that's a good model to have of what we want to do because what we want to do is we want to add, we don't necessarily want to add an n. We want to add a game number to each of the columns. So, we can do game data group by date team 1, team 2. And then we can add game, or we're going to say, we'll do mutate game equals 1 colon n. And then we'll do an ungroup. Open close parentheses, run that. We see here at the end is a game column. So, to test this, I'm going to do a filter where we do date equals 1890. We will then do select date team 1, team 2 game. So, we need that double equal sign in our filter. And what we see is that there was a double header that day. So, LA Dodgers and Pittsburgh, game 1, LA Dodgers and Pittsburgh, day 2, or game 2. There are a couple other double headers here. And then between the Dodgers and Pittsburgh, yeah, that's all right. We had 1, 2, and 3. And there were quite a few double headers that day. I guess back in the day they used to play double headers quite frequently. So, this does exactly what we want. So, we need to take this code without that filter. And we need to add this to our pipe for game data. And so, if we run this now to generate game data, we again see everything we'd expect with that game column tacked on at the end. So, as we go through here, we need to now include that as part of our join. So, this was where calculating the one probability we're going to add game here in our select statement as we build favorite win probe. Again, if we look at favorite win probe, we now see we have that game column. This is where we calculated the win loss live, win losses season. And then this is where we do the join. So, we're going to add game here. And we're going to want to add after date in our select, we're going to add game as well. I've got to run all this stuff, don't I? Game is missing from right-hand side. That's because up here, I need to also select for a game. So, this is date, game, this should work. Excellent. And so, if we look at favorite win probe, we have the season that date the game, the teams, the scores, the 538 model, the WP live model. And then we're going to also join in the season. And so, here we're going to add, I guess we don't have game for this, right? So, win losses season doesn't have games because that's the record at the end of the current season or the end of the previous season. And so, if we run this, oh, I see a problem already that we need to select the game. And this. So, we're losing it somewhere. So, if I run line by line here, favorite win probe is losing the game. Oh, I guess I never ran that. Okay. I'm going to highlight everything going back up to the top. Sometimes, as I'm doing this, I know I'm missing chunks of code to run. So, that's why it's good anyway at the end of everything to run with a blank slate of variables and a new session to make sure everything works right. And here we go. We've got it. We don't have the error message anymore. We've got the season, the date, the game, the teams, the scores, 538, the WP live, the current and the previous season model. Excellent. So, then this is where we get into joining the money line data with our current model. This is where we made that name conversion. And we're going to read in the data money line and we need to do the same thing here where we do our group by stuff. I'm going to fly back up to the top here where I had this code. And we're going to come back down to where we were calculating the favorite win probability. And so, we want to... Oh, I found a typo. Team one, team two. So, I need to come back up here and change this to team one. And we'll have to rerun everything because I punted that pretty bad. Where was I? Flying around too much. So, this is where we read all that in. We look at that. We've got the date and we've got that game column over here on the right side. Excellent. And then we want to pipe this into our mutate where we're going to figure out the probability and the winner. And we're going to do an inner join with... to convert the names. And then we're also going to do the inner join to add the game. Make sure that we've got the right game for the day. And then here, we're going to make things tidy. And I think everything here should be in good shape. So, if we run this, tidy win prop, right? We've got our game. Excellent. And if we look at overall win prop, I don't think this changes the results very much, right? We still have pretty similar values. The money line does about 1% better than the FTE and does about as well as the WP Current, at least in terms of predicting the favorite. So, I'm going to save this. I'm going to source the whole thing. So, this will run that whole analysis.r file. Excellent. So, we see that second plot. We can tab here to go to the previous plot where everything kind of has been pretty steady for the last nine years or so where we have that betting data. Great. So, this incorporates the double headers. And so, if we save this, I forgot to check out a branch. So, I'll go ahead and do that. Get checkout dash B, double header. And I will do get add analysis.r. Get commit dash M, add game column to reflect double and triple closes number five. So, I'm going to double check that. Get checkout master. Get merge double header. Get push. And again, if we look here, we've closed the issue. And so, if we look at our issues, we've got the final issue, the one that we're actually worried about and interested in working on today. So, we want to convert the money line to a payout. We want to join the payouts with the win predictions and get the annual payouts for each model. So, something that occurred to me that we want to add is we need to know who was the favorite. So, create a column that indicates which team was the favorite. As we currently have it, we're only saying whether or not the favorite won and the probability that they won. Well, if we want to figure out which team to bet on, we need to know who the favorite was so that we can then extract that payout from the money line data. So, I'll go ahead and update this comment. That's great. And so, now we're working on issue six. So, we're going to get checkout-b and I'm going to call it betting. So, here we're going to simulate what would have happened if we'd have bet $100 on each game. So, one of the things I'm going to do is I'm going to start to break up our code a little bit. The analysis that we've got here I really like and I want to keep that. And I think this reflects very nicely what we've done in the first three days. And so, we're going to... I'm going to save this tidyWinProb. I'm going to do a write CSV and we're going to ship out tidyWinProb and I'm going to call this data... I'll call this dataModelData.csv and that needs to be saved. Put in quotes. So, if we save that and then we come back to our terminal and then we do lsData we see now that we've got our moneyline data and our modelData. So, I'm going to get add analysis and then model... I'm sorry, data, write out data to file. So, can I get ahead of that again and then I'm going to do a get commit-amend and it will open up a commit message window and we close that and we're good. Alright. Always remember to save your files. This is real life. Alright. So, I'm going to now create a new file, a new R script that I'm going to... I'm going to steal some stuff from this to make and I'm going to call this bettingsimulation.r and this is July 20th and the purpose of this is to simulate how much we make or lose based on betting $100 on each of the favorites from models. So, we're going to use the tidyverse. I'll leave in Libertate and Broom. We can leave in Wes Anderson. We're not going to need our vest. So, we're going to save this, I said, as bettingsimulation.r. I'm going to go ahead and commit this. Say initial commit. So, to do this, the first thing that we want to do as I said in the issue tracker is that we want to convert the money line to a payout. And so, I'm going to go ahead and what did I call this variable yesterday for the money line data? I guess I didn't call it anything. Okay. So, I'm going to call this payout and we're going to read in the CSV and as I've done in the past, I'm going to keep this separate and we'll call it the payout variable later, but for now let's develop the data frame. So, I'm going to want to make this tidy and some of the things that we also need up here is this name convert where we will also do some of this. And so, I'm going to copy over some of this code but realize that ideally this code perhaps should be in that code file where we generated the money line CSV file. It's not very dry. So, dry is the idea of don't repeat yourself. So, it's not dry to keep repeating yourself over and over again. And it just makes things a lot harder to interpret. So, I'm going to copy this and I'll go ahead and run these lines. Actually, take this out and this is what I had down here. Read CSV, drop NA, and we're going to see what happens when we get this far. So, at this point we have the date, the teams, the scores, the money lines, the game, the money probabilities, the probabilities based on that money line and then whether or not the favorite one and the probability of the favorite winning. And here then we do the joins. Remind ourselves what happens here. Score one, score two, all these things. So, I think I want to keep team one and team two in. And, oh, that's right, we've got it here. Okay, so we run this. We get the date, the scores, the money line, the game, the money probability for the probability of team one winning, team two winning. Whether or not the favorite one and then the probability of the favorite winning and then the names of team one and team two. So, we could probably also, for our purposes, get rid of money line, oh, money, let's just leave it this way for now. So, something we want to do is we want to, perhaps in here in the mutate line, we want to say, fave money, pay out. And so, for this we're going to say get pay out. And so, this would be the pay out for each team. So, the pay out, so this would be fave money pay out one and we'll also have fave money pay out two and we're going to give it the money line one, line two, that should work. So, we need to get, make a function called get pay out. So, to get pay out function and we'll call this the money line and the question that is, what is the formula for the money line? And so, I'm going to create a, if it's positive then, and we're going to assume, I'm going to just put a variable in here, say bet equals 100. So, the default value of the bet is going to be $100. And so, if it's positive then the pay out will be the money line. So, if you bet, if the money line is 150 plus 150 and you bet $100 you're going to make $150. The pay out, if negative, the pay out is going to be, so you think, so we might get like say minus 120. And so, you have to pay $120 to make $100. And so, we can then say that's going to be 100 divided by the money line. And of course, we want to make this negative. And let's test this out. I think this is right. So, if we do get pay out, if I put in minus 200, then if I pay $200 I should then get $100. So, this will be, if I then bet $100 then I should get back $50. And it doesn't say anything because I haven't written good R. So, what we're going to do is we're going to say, if else money line less than zero, then the money line is going to be, the pay out, sorry, is going to be minus 100 divided by money line. And if it's positive, so if it's not less than zero, then the pay out will be the money line. So, we can get revolved as jazz. So, where I get pay out, this should be 50. And so, it's .5. So, this is going to be the bet times 100. So, if we do get pay out $200, we get $50 back. If the pay off was say 300, then we'll get back $33.33. If it's plus 300, we should get back $300. So, where I get pay out works. And I want to update our mutate lines. That all looks pretty good. Missing argument to function call. Oh, I must have some money prob 2 not found. This should be money line 1, money line 2. And so, now we see that we've got fave money pay out, fave money pay out 2. And again, if we save this as pay out, I want to do pay out, select, date, team 1, team 2, fave money, pay out 1, fave 2. And I'm going to add money line. I'm good at introducing typos, aren't I? Great. So, we've got the pay out, or the money line 1. Money line, the pay out. And so, yeah, if it's positive, we get that value back. If it's negative, then we get a number less than 100 back generally. Okay, so the next thing we're going to want to do is we're going to want to make this tidy so that for every day and every game, if we know the favorite, we could say from the other models, we can do a join between that favorite and the date, game, and team name here to get back the pay out for that team on that day and that game, okay? And so, we're going to, like we did in the last module, we're going to mutate to do a team pay out 1 equals something and then team pay out 2 equals something. So, we'll do paste and we'll do the team 1 and I guess this should just be money pay out 1. It's not for the favorite. So, money pay out 1, money pay out 2 and we'll do separate with an underscore and I'm going to copy this. So, team 2, money pay out 2 and we're going to then do a gather and I'm going to call it 1, 2. It's just going to be a dummy column that we're going to get rid of eventually and the values is going to be team pay out and we're going to gather together team pay out 1 and team pay out 2. Again, I'm going to put this on separate lines just to run so that I don't have to worry about all this jazz and so we're going to select date, team 1, team 2 or actually I don't want team 1, team 2. I'll do that and then I'll do 1, 2. Team pay out 1, not found. Team pay out. I want a team pay out. So, this then shows us the type of output. So, the date, the 1, 2. We're going to drop that 1, 2 and we're going to then split team pay out now and so then we can do separate team pay out into I'm going to team and pay out and we're going to do convert equals true. Additional pieces. So, I'm not sure what's going on. Let me simplify things. I'm going to do some of this already anyway. So, I'll do select minus line 1, minus money lined minus pay out 1, minus pay out 2 and minus 1, 2 and I'm going to leave in team 1 and team 2 because I'm going to use that to help make my join specific to indicate which game it was. So, we've got kind of index to make our games unique by the date, team 1, team 2 and the game number as well as the scores. I guess that should be money pay out 1, money pay out. So, I forgot to put in the separate value. So, sep underscore. There we go. And so, then we've got these three extra variables. Let's see. Great. So, we see the date, the score, the game, the money probability 1 and 2, whether or not the favorite 1, the favorite money probability, team 1, team 2, the team that this is the pay out for. Okay. So, we're going to use team 1, team 2 as well as the date, the scores and the game to grab that unique game and then if the favorite was Boston, we'll join on team as well. If it was Baltimore that won, then we'll grab Baltimore and then the pay out will be whatever. So, something that's missing from this, however, is that we need to add back into our analysis.r file. We need to add who the favorite team was so that we know effectively who we've bid on. So, I'm going to scroll back up here and we're going to add a bunch of stuff, not really a bunch of stuff, but a bunch of the same stuff. So, if the rating prob 1 is bigger than rating prob 2, then team 1, team 2. And then we want, I'm going to call this team. So, fave 538 team. If we just keep track of this as we go along, the favorite one, the favorite problem, the favorite team. Okay. And if we scroll down, this is where we get the win-loss records. This is where we do the join. So, we're going to then do another if else here. But again, this is team, team 1, team 2. And we want to add the team to the data frame. And then down here, we're going to do this a couple of times again. We're going to want to add the fave wp current team, the fave wp prob team. Maybe I'll break this select up a little bit so it's easier to see. So, it's good I did this because I'm seeing what all I'd forgotten. So, we need to add the team. I'm going to run all this. We still need to do the money line model. Fave 538 team not found. Okay. So, we lost that somewhere up right here. Fave 538 team. This again is why code readability is so important that it really helps you to find bugs in your code. So, again, we need to run this again. And so, we've got 5381 prob team, 1 prob team, 1 prob team, 1 prob team. Excellent. So, now we come to the money line stuff and add the team here. We see that we've added 1 prob team for the money line. So, now we need to make this all tidy, right? And we're going to now paste together the 1 prob and the team. We want team down here. And if we look at tidy win prob, we've got the season, the date, the game, team 1, team 2, the scores, the model who won, the model type, the probability, and the team that was favored. And we're going to then write this out to TSD. And so, that's great. And so, now we can come back here and we're going to save this as our payout variable where we left out. And over here, we're going to want to... So, now we're going to want to do an inner join between the payout data and the model data. And I'm going to copy this and say tidy win prob is read CSV and close parentheses. But again, to develop this, I'm going to hold off on assigning it to a variable name. We run that, we see what it looks like. And so, we can do an inner join between this data frame as well as payout. So, we're going to do a join between this data frame and payout where we're going to join on the date, the scores, the game. And we're going to then do that, but then also do a join between the favorite which is in the team column as well as the team column here. So, team and team. I guess we could have called this favorite or something else. That's fine. This should work. This should work great. So, we'll then do an inner join of that with payout and we'll do date, game, team 1, team 2, score 1, score 2. I think these all need to be in quotes. And then we'll also add team. So, the team is the favorite team in the tidy data frame. And in the payout, the team is the column that then corresponds to the payout amount. So, if we run all this, I'm going to make things easier to see. I'm going to get rid of a couple things. So, I'll get rid of team 1, team 2. Does that help? Score 1, score 2. So, we see the season, the date, the game, the model, whether the favorite one, the probability of them winning, the team that won, or the team that was favored, I'm sorry, the money probabilities, whether the favorite won their payout probability and their payout. So, we could kind of junk a few of these, a few more of these columns. But I think we generally get the idea here. All right. So, the next thing is that the payout, if the favorite one, sorry, if the favorite didn't win, actually, you know what? So, I don't get too horribly confused. Let me look at payout again. I want to clean this up a bit. So, there's a lot of stuff in here that I don't need. And I'm going to do select minus money prob 1, select minus money prob 2, save money 1, and save. I guess that's from up here. And I'm just copying code. So, this is kind of the problem with copying code, right? Save money prob, not found. So, FAV. And so, then if I look at payout, it should be a lot simpler. Great. And so, now if I do this inner join, I have the season, the date, the game, team one, team two, scores. I probably don't need these even. The model who won, or whether the favorite by the model won, the probability of them winning the team that won, and then the payout. So, if they won, then we're going to, so if the Cubs won on last Sunday, then, and we bet 100 bucks, we'd get 7250 out. San Francisco Giants lost. They were the favorite and they lost. They were the favorite by the FT model, but they lost. So, we should lose 100 dollars, okay? So, we need to now update this to do mutate payout equals if else, if else one, so if that's true, then we get the payout. Otherwise, we get minus 100. And so, we see if the favorite one is false, then we get negative 100 dollars, okay? Awesome. So, now we've joined our payout information with our model data. And what I'd like to get here is, I'm not so concerned about like a game by game, but maybe day by day would be a nice granularity to start to work with for our data analysis. And so, I'm going to do a group by date, game, team one, team two, score one, score two, or no, I'm sorry. No, I just want to do it by date. I don't want to do by individual game. So, group by date and we're going to then summarize. So, I'm going to group by date and model. We're going to summarize and say, day's wings equals sum payout. And so, we see that on April 5th, I think there was one game. We'd lose 100 dollars by most of the models, except for the WP live where we correctly got the winner. The other thing that we want to keep track of is because you might have a horrible day, but there are many days. So, we'll add cumulative winnings and we'll do a cum sum day's winnings. And so, on April 6th, something weird is happening. So, it should be 366. I think this needs to be a mutate column, a mutate verb. Let's try that again. And that should be a pipe. So, 266. So, I think we're having a problem with the grouping that we don't want to group by date. We want to keep grouping by the model. So, we do group by FTE, 266, 366, right? Great. So, if we want to see this more clearly, we could do arrange, date, or do model date. And so, what we see is that day's winnings for the first 10 days of the season are here, and that, you know, there's a couple of days in here for FTE where we made a little bit of money, but then aggregated over time. We see we lost a lot of money. Excellent. So, we'll get rid of that. And so, I'm going to then do a ungroup to get rid of all that. And we'll call this daily winnings. That's our daily winnings. We get that, right? So, we have date, the model, day's winnings, cumulative winnings. We could also do annual winnings. I think we could do... I'm going to take daily winnings, or I guess we could take payout. So, right? Payout, I think, has a season in it? Nope, it doesn't. So, I'm just going to take daily winnings, and I'm going to mutate date. So, I'm going to mutate to make a year column, and that's going to be the year function on date. Let me just double check that that works. If we do year at 2018, excellent. So, we mutated that. We're going to then group by model and year, and then we're going to summarize year's winnings equals some daily, or day's winnings. And then if we look at annual winnings, we see annual winnings for each model by year. You can see a lot of red. This is not looking good. And then we can do total winnings, where we can then do annual winnings, and we're going to group by... So, we need to do an ungroup here. So, we're going to group by model. I'm going to summarize total winnings as the sum of year's winnings. We look at total winnings. We see we should not do this. Let me just arrange this DSC total winnings. And so we see by model that WP Current, again the win probability model, where we know the final winning average for each team, will make about $115,000. Otherwise, by these other models, we're going to lose a fair chunk of money. And surprisingly, the 538 is going to lose us the most over that time, whereas the win probability model will lose us the least amount. So, the point is the bookies are always going to win because we don't know the outcome of the season yet, right? So, let's see what this looks like if we were to plot it. So, I'm going to take our daily winnings and pipe that to GG Plot. And we'll do AES or X. I'm going to make be the date. Our Y is going to be daily winnings. And we're going to group by the model. And we're going to ship that to GeoMline. So, that should be a plus sign. Pick scale for object of type, blah, blah, blah. Date. There should be days winnings. So, this looks pretty hideous. We also want to do color equals model. Again, pretty hideous. I'm going to filter to get the models that we're most interested in, I think. So, the FTE equals equals money, model equals equals WP current. Why is this taking so long? Maybe nothing came out of this. Oh! So, comma means and. I want the vertical pipe. So, there's no data that got plotted. So, we see that. Again, that looks pretty hideous. I'm going to do a facet grid where we will do dot tilde season. So, I'm going to make a different facet for each season. At least one year later, it must contain all variables used for faceting. And I don't have a season, do I? So, we want to do daily winnings, mutate season equals year date. And so, each year is represented in a different facet, but we also still have on the x-axis everything that's going on. So, we want to do scales equals free x. So, let's look at each facet, its own scale over time. So, this is the days winnings. And really, what we want are the cumulative winnings. And what we can see is this is cumulating over the years. So, we didn't group by date and not by season. So, let's do mutate season equals year date. And we're going to group by season. So, let's ungroup this and group by season and model. That's what we want. And that, season undone. Mutate season equals year date. Really? You don't know what that is? So, season's there. Actually, season is already there. Okay, mutate group by model. So, we have date. We don't have the season. So, let's group by season date model. Season date model, daily winnings. And then we're going to group by the season and the model to get the cumulative winnings. And we have the season, the date, the model, blah, blah, blah. And so, if we... So, that's ungrouped, right? And we can plot this. Cumulative winnings. So, we do tail, daily winnings. So, we don't need to do this anymore because we already got that in daily winnings. The estate cumulative winnings group model. There we go. So, again, what we see is for each season, we have the cumulative winnings for those seasons. And we see that the WP current model does pretty well. The 538 model doesn't do all that great of the models. It actually did the worst when we actually went to go try to bid on those predictions. The money line. If we bid on those favorites according to the money line, then we're going to probably lose... We're going to lose money. Which is to say that the casino will make money. Which is their thing, right? So, if you're picking the favorite all the time, then you're not going to make your money back. So, some context for this. Like, this is... I mean, don't get me wrong. This is a lot of money. You're going to lose, like, say, $12,000. But if you think about it, in the course of a season, that each team, so there's 30 teams, and they play 162 games, divided by two because you don't want to count games twice, and then you multiply that by 100, that the total amount you had bid, bet, of course, all the season would be just above 243,000, right? So, you got to include the postseason as well. That 538 looks horrible here, but it's really losing, say, 12,000 out of 243,000. So, you're losing 5%, which isn't fantastic, but it's not a huge fraction of the money you bet, okay? So, finally, I want to clean this up a bit, and I'm going to steal some of our styling from before for our betting simulations, where we will do breaks of FTE money, WP current. Yep, that's great. Our x-axis will be date, y will be winnings. I'm going to say don't gamble based on the model's predictions. All data since 2009. I'm going to also add theme classic, and I'm also going to add, let's run that, make sure it looks good. Nice. And I'm going to add a horizontal line across the middle. So, I will do geomhline, AES, intercept equals 0. I think it's supposed to be y-intercept. Boom. Very nice. Kick this over a notch to make it more readable, and we're done. Great. So, I hope you enjoyed following along with me what we've been doing. I'm going to finish up here by committing our changes, simulate betting on favorites. Closes. Closes number six, number six. And I've saved that. Get checkout. We go back to get checkout. That looks right. I've saved it. I'm going to close it. Again, go to terminal, checkout master, get merge, betting, and then get push, and go back to our issue tracker. And we've done that, that, that, that, that. And the issue's been closed. Fantastic. Well, there's clearly a lot more we could do. I hope this series of demos has been helpful in seeing how you can use our statistics, statistical software to program data analyses to try to validate different models. I think the ELO model seems to be a great model for predicting winners and losers. It just so happens that when you then apply that to the actual betting lines for the individual games, that you're going to lose money. But again, remember the game's fixed, right? Like the casinos know how to make money, and so it's impossible to make money doing this. But it does a pretty good job of making predictions. The estimates of the actual win probability were a little too conservative. I don't think that affects how much money you're going to make. But again, thinking back about the 2016 election and whether or not the model was right, the challenge is that we only have one game. We only have one iteration of that election. And so we don't know whether or not the model was correct. If Hillary had won, we still wouldn't know if the model was correct, right? And so the beauty of sports data is that we can validate these models and we could do the same thing with basketball or football or anything else. So thanks again for hanging out with me and doing this analysis during All-Star Break and talk to you soon.