 So, welcome to the afternoon sessions. Our first speaker is Nikolas Kuhabt from Data Convolution. Nikolas studied mathematics in Germany and in the United States, and he got passionate about the topics of big data and data science during his master thesis. He then worked as a big data consultant and later changed to Fraunhofer IEEE to work as a research data scientist and beginning in 2020, he worked as a freelancer in data science. He is now talking, his talk title is probabilistic forecasting with deep AR and Amazon web services SageMaker. Nikolas, please start sharing your slides and let's start. Thanks for the nice introduction. I hope you guys had a great break and yeah, my name is Nikolas Kuhabt. I join you from Germany, the wonderful city of Kassel, and I'm a freelance data scientist. I like forecasting time series and about a year ago I read about this forecasting algorithm, which is called Deep AR. It was published by Amazon Research and I got into the algorithm and it has some nice features, so I thought it would be nice to give a talk about it. So first question arises, do we actually need yet another forecasting algorithm? And I would claim yes, the Deep AR algorithm has its right to exist because as I already said, it has some nice features and I think the combination of features the algorithm has makes it really unique and it should be in the toolbox of a data scientist. So let's first understand what this algorithm actually is, what are the features I talked about. So first, this one is already in the title. We have probabilistic forecasting. Just imagine I would tell you tomorrow it's going to be sunny and that's a forecast and you are happy and leave your umbrella at home. And then another time I tell you tomorrow it's going to be sunny with a probability of 60%. So that's now a different information. You may not leave your umbrella at home if you know that it's likely only 60%. And the point is that putting probabilities to our forecast really adds value to our forecast. It's not about the forecast itself but saying how likely the forecast actually is and also to give some boundaries in which the forecast may lie. So that's the big feature of DPR. There are also other algorithms like the ARIMA and regression models which can do it. But not for example if you know deep learning the LSTMs which are neural networks they cannot do it at least not out of the box. Another point is the automatic feature engineering. That's basically what you have when you use neural networks. So you put in the features into the neural network and then neural network somehow models the input in a way that it automatically generates some features. And then we see that the plain LSTMs which are neural networks they can also do it but not our classical approaches like ARIMA and regression models and so on. And the last one is also quite unique to DPR. It says we train one algorithm for multiple time series. So our problem setting is we do not only have one time series but multiple time series which should be related in some way. Yeah but they obviously can differ. And I don't know any approaches classical approaches to forecasting which already cover this that one algorithm learns all time series. There are some concepts which do something similar like meta learning. Meta learning is where you want to learn how to optimize. So you take one time series you learn how you can optimize this time series. And then when you have another time series you already know how to optimize time series and then you can apply this optimization technique. Transfer learning is similar but it does not transfer the optimization strategy but more like the knowledge. So you can think of an algorithm which you have already trained and it has some cool patterns detected and those patterns would also be nice for another time series. And then you transfer this knowledge this is transfer learning. Yeah but our deep AR algorithm does it a little bit different way. We will see soon understand like the intuition behind it. If I talk about the advantages I also have to talk about the disadvantages. What is bad about this algorithm? This basically applies to every neural network. It's time and resource intensive to train. So it's not like you get your results in a minute. And also there are some hyper parameters to set which are not intuitive I would say. And you either need a lot of experience to set them correctly or you just have to try and tune it in iterations. So this also goes together with the first point that it's resource intensive so it takes some time and you cannot set a hyper parameter correctly at the first time. So that might be a long circle of improvements. Okay let's briefly understand how it works just in concept. No deep math. We have our time series. Time series here is the X in the lower part at the bottom. We give our X as the input to our first neural network. The neural network is like the middle part of it. We basically know this from LSTMs. For those of you who are familiar with LSTMs it's not important. But the point is our network has as an output not the prediction but the parameters of a probabilistic function. So let's take an example. We have the Gaussian probabilistic function. And the Gaussian has two parameters. The mean value and the standard deviation. Those are the two parameters which determine the probability function. If we have those two parameters our probability function is clear. And what this neural network now does is it gives us the parameters for the probability function. So output from the network is the mean and the standard deviation. And then we are in the top layer which is CL, L is our probability function. And then we can just sample from the probability function. So if we have the normal distribution we sample from the normal distribution and we get some value. And this is actually our forecast. So that's important to remember. The network doesn't put out the prediction itself with the parameters for the probability function. Then we see we sample from those probability functions. That's basically Z. Okay, that's the sample. And you see the dotted line going from Z down to the next input again. That's where the auto-regressive in the name of deep AR. AR means auto-regressive. That's where it comes from. You just put the sample back in the input again. Now one more remark. If we talk about multiple time series, you can see here's only one X which you have as an input for the network. Now we want to handle multiple time series. How do we do it? Well, basically you have this one architecture. This one architecture you train with every time series. But you have to scale it before you put it into the network. Okay, so that's the crucial part. Scale your input and you're good to have it as an input for the network. One last remark is that if we sample our Z at the top, then we have just one prediction. Now, as we are sampling, we can sample multiple sets. And this is basically what gives us our probabilities. So imagine we sample at the first time step. We sample once, put it back into the input, sample the second example. And then in the third time step and sample the third set. So now we repeat it and we get multiple values for Z1, multiple values for Z2 and so on. And let's say we have sampled 1000 samples. Now we can make the boundaries and say like what's the probability of the prediction being in this corridor. Okay, so sampling multiple times, let's us know the probabilities and give boundaries to our prediction. Now, where can you apply it? And here I have listed just some sample data sets. Some are from the paper, some are from experience I read online. So I think sales at Amazon is maybe the most obvious one because the paper was published by Amazon. And you could imagine like every item in the Amazon store is just one time series. And then you have multiple categories of multiple time series and everyone you want to forecast. And you could also imagine by having only one architecture, one neural network, that somehow the time series will learn from each other. So you forecast one time series and by training on this time series, it will also learn about the other time series. Okay, the second point, sales in stores. I read about it. That's, that was quite interesting in Axel Springer for that, which is basically media company, which sells magazines here in Germany. And they, they want to know like, how probable is it that magazines are running out of stock in a store? Okay, and then every store would be time series. And somehow they share some patterns which the algorithm can learn and then forecast every time series individually. Okay, the second one forecasting load of service in data centers. That's also Amazon. They have the AWS, the cloud service. And I think it makes sense for them to forecast what's the load going to be in the future so that they can provide more infrastructure and give some guarantees for providing infrastructure. The last two examples are from the paper, car traffic would be just watching one lane and see like how many cars pass by the lane. And then you have one lane is one time series and you can predict the traffic. And the last one, you have different households, every household has its own energy consumption. Again, you can imagine, they share some patterns, some similarities, but again, every household is, is different in a way. Okay, and the last one is also which I trained the deep AR algorithm with, which I used for experimenting. Okay, here we just have one sample how it looks like. That's that's basic chart of the energy consumption. And you can clearly see the daily patterns. I forgot the X axis, but at the night it goes quite low the energy consumption and then during the day it goes up again. And just to show you, we have multiple of them. Right. So, here we have eight, but I think in the total data set, there are 350 households with their corresponding energy consumption. Okay. So, like I said, the deep AR algorithm was published by Amazon, and therefore they also integrated it into their system into AWS. And they are machine learning service, they call it Sage maker, Sage maker itself is again big so it has a lot of components. What I used was the Sage maker notebooks. So I just created a notebook, and it's exactly the same as a Jupyter notebook as you know it. Okay, so you just give it an instance name. That's like, which server do you want to use behind the notebook, and you can have like really big servers with GPUs or a small one and then also the costs may differ. So what I found quite interesting is the crown truth in the corner in the left corner for labeling. They integrate the mechanical Turk service, which is a platform for distributing labor to people or small work packages to people. So you can use this platform with crown truth to label your data. Okay, so it will really be provided to people to label your data and then you get your data labeled back. Of course, it costs something, not for free, but I think they do it quite cheaply. Now, let's, let's see what we have to do to get this deep AR algorithm running. That's not to show you how to import in Python, that just to show you what you what we will need for the deep AR algorithm. So first, there's Boto 3. Boto 3 is basically the Python SDK of AWS. Boto 3 lets you access every service in within AWS. But here we have for two services. We have extra imports. We have in the second line the S3 file system. That's the file system service by AWS. So the S3 stands for simple storage service. Okay, so you can save whatever you want. And we need it here to save our results in the steps in between in S3. And then SageMaker, I already told you about what is interesting here is the execution role. Execution role is basically about the permissions. So we are in a notebook instance, and then we want to access S3, the storage. But who tells us if we are allowed to access the S3 storage? Or we want to deploy our algorithm to a server? Who tells us if we are allowed to do it? Amazon has this structure to let the services play together with roles. Okay, so a role is basically a policy which allows you certain things to do with other services. And in our notebook, we get the execution role, and then we are allowed to do whatever we have as a role for our notebook. Okay, and the last one, the AWS API is a little bit different to what we know from SK Learn. For example, and the image URI is basically to tell our algorithm or to tell AWS which algorithm we want to use. Okay, so we don't import XGBoos, we don't import regression, but we get an image for our algorithm. Okay, then the interesting part is how do we need to prepare our data? And here again, it's a little bit different. Normally we are used to pen this data frames. Here it is JSON lines. Every line, every JSON line is one time series. So in total, we see here three time series, and we need to have at least two parameters. We need the start, that's just a timestamp as a string where the time series starts. And then the second one we need is target, and target is the time series itself. So that's what we want to forecast where we are interested in. Okay, then we have two optional parts. The first optional part is cat, which stands for category. So you could imagine this is like a feature which tells about the category of our time series. Let's say we have the energy households, and then maybe we have different categories of households. Like the first one is a family home, the second one is a single home, and I don't know, maybe the third one has a Tesla, which is a different category like that. And then the last part, which is also optional dynamic feed, dynamic feed, which means dynamical feature. This is an additional time series, which gives information about our target series. So you could imagine if we have the households, we want to know the energy, what would be a good dynamical feature, maybe it would be the weather. Okay, if it's nice outside, then maybe people go outside and don't use that much energy. So we could include dynamic feature, the weather or the temperature, but we could also include multiple features, multiple time series. Here it is important that the length of this list in dynamical features, it has to be the same as our target. So for every time point in target, we basically have additional information. Then we have some hyperparameters, which we need. Most of them are just what you know from neural networks. So the first one, time frequency. If you look back at the preparation of our data. Sorry. We just give it the start date, but it doesn't have the information about like how it goes on. And here we have the hyperparameter time frequency, which just says hourly distance between the individual targets. Context length just means how much information do we want to give in order to forecast. And here 72 is just three days. So we take three days to forecast. And that's the next parameter prediction length, next 24 hours. Yeah, I think the other hyperparameters, maybe Gaussian, this is also interesting. Here we give it the probability function we want to use. I told you at the start where we want to sample from. Maybe one short note for the likelihood function or probability function. Two things are important for choosing the probability function. Because because you want to sample from it, you have to make sure that it's easy to sample from it. Otherwise, it will take a lot of resources. And second, you also need the gradient from the probability function. So make sure that you can calculate the gradient from the probability function. Okay, otherwise it's the same as we know from neural networks. So let's see what we need for training them. First, we initiate a session that's a SageMaker session. Then I told you about the execution role. That's where we get the permissions from. And then here last, you see the image URI and here it says forecasting AR. So that's where we tell it which algorithm we want to use. And then we put the information together and we can start training. So we initiate an estimator. We give it the session, the image name, the role. And what is also interesting, we give it the instance, the name of the instance we want to use. That's basically just a server. And Amazon tells you how much the server costs. And if you have a bigger one, training goes faster, but it's also more expensive. And I use the C4 X large, it costs about, I think 45 cents per hour. Okay, but training for the energy households, it took less than an hour. So I think I paid about 50 cents for the whole. Okay, and then the S3 bucket, that's where we save our results. Okay, and in the last line, we just give it the parameters from the slides before. Okay. Now we start the fitting process or the training process. Here again, it looks a little bit different how we give the data to the fit. And we make it with S3 buckets. So in our S3 buckets, the data is saved as the JSON lines. And we give it as a parse name. That's two minutes. Okay. So we give data channels to the estimator. We start the fit process. And yeah, it works. So nothing more to do. What we have to do now at the end is to deploy it. Again, this is quite easy with AWS. You take the job name and you tell SageMaker make an endpoint. And this endpoint, there is the model deployed. And you can basically query it. Okay. So again, we have to give it an instance here. It's a slightly smaller server. And then there's the model deployed. And you could, for example, use REST to query this model and get your predictions back. So this is quite clever. I think this approach is quite good. Okay. So what's left? Let's look at two examples. We again see the daily patterns and we see the 80% confidence interval. And we see that most of the time, the time series actually is within the 80% interval. And we also see that we have two different patterns here, but they are both quite well fitted. And the last one, again, slightly bit, little bit different patterns, but again, quite good fitted. And also in the last one, in the lower one, you see the weekly patterns, like five days. And then you have the two weekends here. Okay. And the two weekends are also captured by the forecasting algorithm. Okay. This concludes my talk. Thanks for listening. Feel free to ask me questions afterwards. Thank you very much for your talk. We have time for one or two questions quickly. I see there are no questions yet in the Q&A section. I do have two questions actually. One is not really related to the content of your talk. It's more of what's your setup? Did you have a screen screen? No, I didn't do it. Yeah. Okay. I can start my camera and then you should... Oh, yeah. That's perfect. And you run the software with your whole screen. OBS. Oh, okay. So you go full screen and go share your screen. Okay. It's just not related to the talk. I have another question related to the talk. It's about deep AR. It's an Amazon algorithm, but I saw there are some open source solutions too. Did you test them and are they reliable? So I did not test. I also saw them. They are by implemented mainly in PyTalk. And I think one or two implementations are in TensorFlow, but I've not tested them. Okay. Great. Thank you very much again. And if there are more questions, please go to the Discord channel. Just with command K, you can search forecast and then you will find the channel and then ask more questions to Nikolas. Okay. Thank you very much again. Thank you.