 Hello everyone, thank you for joining my session on time series for casting using using coronavirus data set. So a little bit introduction about me. I'm Sona Pankaj and I'm working as a software lead and I was also an expositing faculty for robotics and computer vision. So a little bit content and today's agenda. So today we're going to be discussing about exponential growth in epidemic. So why it is important? We will be going through exponential growth because it is stuff for human mind to into an exponential growth. History of pandemics because the talk is related to coronavirus. So I thought of maybe we should go through the history of pandemics, flattening the curve, why social distancing and masks are important, why it is important to take care that the coronavirus doesn't spread as much as it is metrics for prediction. So there are various we will be using various algorithms we will be seeing how they are performing. So we have to define one matrix to understand which one of them is performing well. Here we will be using RMSE but you can use any other metrics. So understanding difference between prediction and forecasting. Oftentimes people are confused between what is prediction and forecasting and we will be looking into different algorithms of forecasting, machine learning, non machine learning and some other algorithms. So let's get started. So if I ask you a question about if there's a water lily in a pond and it doubles its population every day, at what day it will reach a population of 1000 water lily? So you have to think exponentially, will it be week one? Will it be week two, 30 days? So the answer is 10 days. So it will take 10 days for that one water lily to reach from a population of one to 1000. So a little bit history of pandemics. So you know that pandemics are not new things and throughout history there haven't been any cases like the black death which wiped about 40 to 50% of Euro population. Smallpox was also considered as pandemic before its vaccine was discovered. So on 30th of September 2020, the global coronavirus death toll has reached over 1,769 something. According to the researchers at Johns Hopkins, more than 33.6 million cases haven't confirmed and there could be more cases. So when flattening the curve was important, we read a lot about this jargon in media flattened the curve. So if we allow the disease to spread at its natural R0 rate, its natural reproductive rate, it will follow this red curve and this red curve is your exponential curve. It will take that red curve and so people will be affected so much that it will surpass the medical facilities that the country has. So to keep the growth lower, to keep attention on every patient, it is important that the confirmed cases follow the logistic curve so that we still have enough facilities to provide enough masks or maybe enough ventilators. So the metrics to measure the performance, there are various metrics is error which is forecast minus actual, your mean square error, your root mean square error and your mean absolute error. We will be using RMSE. RMSE is a quadratic scoring rule that also measures average magnitude of the error. It's a square root of average of square differences between prediction and actual forecast. So what is prediction and forecasting? Trigger data. If you are predicting some values on that, then it is prediction. But if you are performing prediction on the future data and the future timeline say 5 days ahead, then it is forecasting. So what is prediction and forecasting? If till that time you have collected the data, if you are predicting values on that, then it is prediction. But if you are performing prediction after that, then it is forecasting. So the algorithm today we will be discussing is knife forecasting, moving average, linear regression, machine learning algorithms like linear regression, support vector machine, neural network algorithm like LSTM, long short term memory which is a special case of recurrent neural network and then your profit and anima. So what if we see a series which is full of white noise, if we want to perform knife forecasting on that and what is knife forecasting? It is nothing but taking up previous data and appending on it. So you can see it's just overfitting, overlying on that actual curve, your blue line is your actual data and your orange line are your predicted data. So you can see it is just overfitting that and if you want to get into the code, it's like appending the series but one unit later. So that is knife forecasting. If you do it on coronavirus data set, it looks like something like this. So you see it's nothing new, it's just superimposed on that actual graph. So that's my knife forecasting. Now we'll move on to moving average. So in moving average, what happens? We try to find the mean of the, we try to find the mean of the data and try to forecast it five days later. So if you see the code of moving average, we have it's same series plus a window size. Hello. I think there was a delay in you are just switching the slides. Can I go on? Yes, please. Can I go on? Hello. Yeah. So if you see the code difference between knife forecasting and moving average, you will see there's a window size available over here. So this window size is the number of days you want to predict in the future, maybe five days, maybe 30 days. So if you want to predict on plot a graph on the coronavirus data set, it looks something like this. Your blue is your actual line and your orange is your predicted line. So you'll see there's a distortion over here. So this distortion is the same as the window size you have applied over here. It is following the trend but we cannot say, we still cannot say this forecasting and there's a lot of difference between what actual is and what it has predicted. So let's move on to the machine learning algorithm and see how they're performing. So the first machine learning algorithm is your linear regression. If we want to, so if we want to vector in a vector form, if we want to say what is linear regression, then it is theta t, theta task force dot x, where theta is a parameter and x is the data. If we graphically want to know what is linear regression, then we can say that a line which is going through maximum number of points. So here the red line is the linear regression line and the blue dots are your data sets. Now how we train the linear regression model? So to train linear regression model, you need to find the value of theta that minimizes the root mean square error. So here what we are doing, we are tweaking theta, x and y are your data sets, we are tweaking theta so that your minimum square error minimizes. So training a model means setting its parameter so that the model fits the training set. Now if we want to code the linear regression, we can take great wise data sets from the confirm data sets, day wise and the confirm cases. Then we can divide the data set into training data set and valid data set and then we can import the library of linear regression from S.K. Learn. There's a S.K. Learn library called just linear regression. Now what we can do, we can just fit the points of days and confirm cases. So the fitting has been done linear regression.fit. Now if we want to predict where we will predict, we will predict in the valid machine learning days. So the graphs look something like this. If you want to perform the graph, so your actual cases are your blue lines and your predicted confirm cases if you have a black line. So still we can see it is trying to pass through the maximum number of points but there's still a lot of difference between what actual case is and what predicted value came out. So let's make it a little bit complex. From linear regression we can move on to your curve fitting or like we can increase the order of the line. So for that we can use support vector machine. So support vector machine works in the hyperplane level. Now what is hyperplane? So hyperplane, to understand hyperplane we have to understand what is line and plane. So line is a point with a direction and what is plane? A plane is a line with a direction. So now what is a hyperplane is n-1 direction in n vector spaces. So it's not only one line, it is spread across n-1 directions. Now if we talk about support vector machine, so every neural network has one node. So every node represents a kind of support vector, a kind of hyperplane. Support vector has three steps to follow. I'll get into each every step. The first step is to maximize the distance between your hyperplane and your points, so support vector. So whenever there's coming to be one point and the hyperplane you have to maximize the distance. The second point to consider is to minimize the number of misclassifications. So whenever there's misclassification happening between positive and negative points you have to minimize the distance between them. The third point is taking the points in the three-dimension. So if you see there's a nonlinear distribution of data you cannot differentiate with the line or the plane. So you can project it in 3D and then we can see a demo over here. If we want to differentiate it, we cannot differentiate it basically without projecting, so we just project it in the three-dimensional. Now we can easily classify them with the plane. So now if we have to code support vector machine, it's a very good search CV is a SQL library, so you can easily import that and you can initialize SVR model with a hyperparameter. So these are the hyperparameters, your kernel. So the projection of 2D into 3D is what is called as kernel rig. So you have to define what kind of, what kind of differentiation it will be calling. It will be linear or calling. C is your regularization function. So regularization is parameter is for stopping, exploding or vanishing gradient. So yeah, you cannot, you can keep them default. So even if you leave them, the library will take some numbers by default. Yeah, so here you can see your grid search CV, you just give an estimator that you want to perform, that is SVM support vector machine and you give your CV5, CV is cross validation number. Then you fit the points between training days and confirm training, the test, the training set that performed was training and validation. So whenever we are fitting the curve, we have to do it in the training data set and whenever we are predicting, we have to do it in the valid data set. So here we have taken training data sets of days and confirm cases. We have performed the best estimator. We have given all the parameters that it might need and then we predict in your valid machine learning data set. So what we see through the plot is it's far better than linear regression. It is following the trend as well as accuracy is also good, but we are not sure that in future it will be performing this well. To overcome that, we can go into neural networks. Okay, now we can go into neural networks. So why RNN? We have chosen RNN, one second. So RNN enables us modeling time dependent and sequential data tasks such as stock market, machine translation, text generation and many more. So suppose there are two sentences, beer girls runs a show and run away from the beer. In both the sentences, beer is different. It is a person and an animal, but unless an act will be reached to the first stop, we are not able to find which beer it is referring to. So RNN cannot work until you have given the whole data set, the whole sentence to be given or whole data sets. So however, RNN suffers from a problem of vanishing gradient. So we will see what problem are RNN performing. So this is the basic architecture of RNN. You can see at every unit, there's input, there's output, there is activation function going from each cell to another one second. So this could perform a better prediction, but there's vanishing gradient problem. Now what is vanishing gradient? So vanishing gradient hampers learning of long data sequences. Wherever there's a long data sequences, there will be a problem of vanishing gradient. The grad gradients carry information used in the RNN parameters update. And when gradient becomes smaller and smaller, the parameter updates become insignificant, which means there's no real learning is done. What does this mean? This means that whenever we perform a learning in the neural network, there is some kind of loss coming at via neural network. And we need to back propagate it so that we can divide the weightage amongst the neural network. But in RNN what happens when you back propagate it, the error becomes so insignificant that there's new learning happening in the initial neural networks. So because these sequences are so long, the error is not reaching to the initial networks. So that is the whole problem of vanishing gradient. And how do we overcome the vanishing gradient? We overcome by using units of LSTM. Now what is LSTM? Long short term ability. So what happens here is that is learning is happening at every unit and every unit is forgetting, updating some points from the error. So an LSTM network has three gates that update and control the cell states. These are forget, input, gate and output gate. And the gates are using hyperboloic and tangent function to keep the activation function afloat. So keeping the error so that it doesn't become insignificant, there is activation function going on. So the forget gate controls what information in the cell state to forget. The input gate controls what information will be encoded in the next cell and the output gate control what information encoded in the cell is to send to the network as input in the following time steps. So you can see the various gates and what are they doing here. They all are giving one activation function and they all are for getting some insignificant data and remembering some important data. So a memorization is happening at every step. So it's very easy to code LSTM. You can import a Keras model in the sequential form and you can give two layers of LSTM over here and then two layers of your activation function. Remember providing activation function is important here to minimize your vanishing gradient. Vanishing gradient is happening because the loss is getting smaller and smaller. So activation function is always important to keep it becoming insignificant. You can give an optimizer over here, stochastic gradient descent. You can compile the whole model. You can give what kind of metrics you want, optimizer, loss functions you want and you can just fit the data. So while making the plot, you can just forecast, save the results, append the forecast and make it in the form of results and it looks something like this. So here you can say it is also following the trend and it is kind of maintaining a minimum distance from the actual and the predictor. There's a minimum distance between them. So after that, still we can say that it's better than the previous algorithms but we can still make it better. So for now, we can use ALIMA. So in ALIMA, you can use stats model, statsmodel.tsa, that is from where you get imported. You can fit, you can make the series, you can fit and you can predict on the valid states and then you can see the plots. So here you can see the ALIMA model prediction set and your validation set are quite like very accurate and it is also following the trend that it should know something about profit. So profit is a library of Facebook. So you can import it from fbprofit, import profit. Now you can make the fit profit dot fit and then you can give the data of your every day advice, confirm cases and days and you can predict using forecast C here, make future data frames. If you plot it, it looks something like this. So your black dots are predicted and blue are your actual data set. So now we will compare all the algorithms. So moving average perform, RMSA is 42,000, linear regression 35,000, support vector machine perform better 22,000, profit 9,000, ALIMA 4,000 and LSTM 10,000. So clearly the winner on this data set is ALIMA and thank you. So if you need any help, you can contact me through email. The codes are available at my GitHub account sona.pankaj95 and you can follow me on Twitter at www.pankajsona. Thank you everyone. I am open for the questions or I will be available through this. Have been some questions. So one of the things first of all is I think some comments here are I understand where we are coming from, but the thing is the audience here is a bit wider than what individual people are expecting it to be. I can take the chats, I mean on ZULIP, if there's more doubts, I can take it on that. No, I was trying to make a point here. So I request people to be a bit more diplomatic in their chats comments, because the audience here is wider than individual. And the second thing is whatever you have, please phrase constructively. It is not helping. So one question that seems to come up is there is the training set and I feel like there is information that is still required. So is there some more notebooks that you have that might provide more information on this, because I understand that given the time constraint, this topic is a bit wider than what you can have. So if there are more references you can provide. Yes, references. I mean, you can go on my GitHub. There are data sets available over there itself. The data sets I have taken is from Kaggle, confirmed coronavirus data sets. So I think the information was missing there. So can you please put it on the ZULIP chat and CNN and some more people, Sahesh, all of you will find information there. Surya Prakash asks, how to use multiple variables in ARIMA? Multiple variables. Yes. One second. This was univariant. So if you want to import it, there is just one second. Here it was the valid set that was being predicted. So here everything is univariant. I take it offline and maybe I'll get back to him. Sure. Okay, then thank you for the talk. CNN and other people should be available on ZULIP for questions on the stage. So CNN asks what was your learning from testing out all these different models? Is that a question you can take up now? Yeah. So the learning is parameters you can always try to find what problem you are like, how to make it better every time. So suppose linear regression was not performing well, I can take it to other algorithms like SVM or I can take it to neural network. So I didn't use CNN here because I knew that it might not be the right choice of algorithm. So choose algorithm according to the question because it's a data sequential question, you can use RNN easily and RNN will give you the right kind of answers because RNN is basically better in that kind of question. So choose algorithm accordingly and whichever is better for that particular scenario. Okay. And Sampath says the code is not there on Github. Please make that accessible if you want. Okay. Okay. Okay. Thank you. I think there are a lot of questions to see and you've also asked what are the other options compared to Aarima. I don't know some four questions as well. I will make that available in the chat. Yeah, sure. Feel free. Yeah, thank you. Yeah, it's fine. I'm going to bring you down on the stage but thank you so much for presenting. Sure. Thank you.