 Hey everyone, this is going to be a quick video on how to do probabilistic forecasting with our packages, the Fable package specifically. So we're going to be using the same data set and generating these plots here. This is the M5 data set that we're going to be using. And so in this video, we're just going to cover how to do these grouped forecasts for California, Texas, Wisconsin, and doing these different benchmark methods, the mean forecast, naive forecast, and seasonal naive forecast. And then in the end, we're going to go ahead and zoom in on the seasonal naive forecast. So let's go ahead and get started. So I'm just going to go by this line by line. First is just importing some libraries. Then I'm setting a GG plot theme. This is importing the data and setting up the train set and test set. The test set is going to be just the last three weeks. And so the days out is going to be the last three weeks. That's going to be our forecast horizon. Here's our forecast model. So we're going to just be using a plain seasonal naive forecast, a mean forecast, and naive forecast. And the seasonal naive forecast is being lagged by one week. You could choose different lag values, but our time series seems to have a weekly seasonality. We could also add some transformations here like a box cocks transformation or log transformation, but we're just keeping it simple in this video. One really cool line that we have here is we've got this generate function where we're specifying how many days out we're going to forecast. We're going to do 200 draws, and these are going to be built using bootstrap residuals. And then we're going to just put it into a tidy data frame. And just so you have a sense of what's in here, I'm just going to print it off here. We've got the state ID in one column, the model used. So there's three models, the seasonal naive, mean forecast, naive forecast, the date, the repetition is those draws. So there's going to be 200 repetitions and then that simulated value. So then we just move on to some plotting here. I'll describe what's going on in this plot. So the rows here are the states. So we have California, we have Texas, and we have Wisconsin. And then the columns are the forecast method used. So the mean forecast, naive forecast, and seasonal naive forecast. So the mean forecast is just using the historical average and saying that the historical average is the most likely value and building the distribution in that way. The naive forecast is picking the last value as the most likely value and sometimes this is called a random walk forecast. Building forecasts off of that assumption that the last value is probably the most likely value. The seasonal naive forecast is just like the naive, but you have this idea of seasonality baked in. So as opposed to just picking the last value, we're picking seven days back or a week back for the seasonality component. And you can see that our time series does have some seasonality in most of these time series here. So it looks like the seasonal naive forecast might be the benchmark of choice when we're going to go ahead and do some more advanced modeling. The seasonal naive method is going to be the method to compare against. So I wanted to zoom in on the seasonal naive method a little bit. I wanted to look at it in a little bit more detail. So this next line here just takes care of, or I shouldn't say line, the next chunk takes care of just the seasonal naive plot. One thing I didn't point out is that the white line here alone, this is the training set and then this is the test set. And I drew the test set line here just to kind of show an overlay of how well these models are fitting or not fitting. And so you can kind of picture this like a histogram. Each point out could be a histogram. And so you could say that maybe this first point would be in the 90th percentile. Those other points here that might be closer to the median. Overall though, it seems like the actuals tend to be high on our distribution. So overall looks like maybe we're under forecasting a little bit. But as opposed to just looking at plots, we can take the CRPS function and calculate values to get a sense for how well the distributional fit is. Just a quick note, this function wasn't from my own doing. This comes from Rob Heimann's research group, Tidyvertz. They make forecasting software, so I just linked to where I grabbed this function. I want to make sure that they're getting the credit, not me. And so here I'm just calculating CRPS and we can get a sense for what's fitting well. So this is CRPS sorted by state ID and mean CRPS. So it looks like for California, seasonal naive is doing the best. And for Texas, seasonal naive is doing the best. And for Wisconsin, seasonal naive is doing the best. So the way I normally go about this is I just say, hey, we're going to start forecasting these different states. And looking at some of these benchmarks, we're going to pick seasonal naive as a benchmark method and then start building complex forecasting methods from there. And it's nice to have a benchmark method. Not just as, you know, can we do better than the benchmark, but getting a sense for how much better you can do than the benchmark. Quantify that gain. Sometimes the gain is huge and it's definitely worth it. And sometimes the gain is small and still really worth it. And if you can get some costs tied to how much better your alternative forecast method is, then you can get a sense of whether it's worth going with the more complex method or if the particular thing that you're forecasting is going to be forecast just fine using a benchmark method. But no matter what, you should always have some kind of benchmark method running in the background to just make sure that your production method is staying on track and that if something does go a little wild or move in an unpredictable way, you have a benchmark to fall back on. It's kind of a reliability engineering feature. So that's all I've got today. Thanks for watching and we'll see you next time.