 Hey, I'm Vikrant. I'm a Python enthusiast. And currently, I'm building a startup in Singapore, basically trying to build a product catalog manager, and one of the largest global product repository. Product repository means product information for commerce. You might laugh that I'm trying to do something which Google gave up, but that's it. Before continuing this, you have read what I have wrote, the article details, right? That how I got into machine learning and what was the reason. So I thought that I will share my journey into machine learning with Python group here, right? So before starting machine learning, I think all of you are familiar with machine learning? No? Not really. So let's start with a very basic thing. If today, I say to you that I have a medicine which cures all diseases of this world. Will you believe me? Will you believe me? Oh, really? You will believe me? Very nice. But why I say this is because today, I see a lot of hype around machine learning and artificial intelligence everywhere, as if machine learning and artificial intelligence is a panacea of all the problems it can solve every problem of this world. There is one smart computer program running there. He will solve all the global problems for us. But obviously, machine learning is an advanced technology. Obviously, it's a very old technology. And previously, we couldn't do it because we don't have that much storage and computing power. Today, it is possible because we have storage and computing power. So in my view, it's more of an evolution than a revolution of what knowledge we have gathered across in mathematics, computational statistics, linear algebra, and applying them in modern problems and using the computing resources and storage to solve problems. Now in my talk, I will try to concentrate more on the realistic how I got into machine learning and how I use it for solving my problems. Probably it might help you to differentiate hype from reality. So let me come back to the definition of machine learning. What is machine learning? Machine learning is a subfield of computer science that gives computers the ability to learn without being programmed. That is basic definition of machine learning is. It's a kind of artificial intelligence in which computer programs use computational statistics, linear algebra, and through pattern recognition, try to predict something on an unseen data. In my case, what I wanted to do it is I want to learn from urban renewal authority data, which they release for rentals and purchases in Singapore and try to find if I have the address of the property or the location of the property, the square foot of the property, can I predict a reasonable rental for that property? Now, if you know machine learning is very good, but then machine learning is divided into three parts. One is supervised, unsupervised, and reinforcement, whether it is deep learning or whether it is linear regression and traditional machine learning, which you are talking about. In supervised learning, I train the program through a labeled data, and then it predicts for unseen data based on the training I have provided in this one. That is supervised learning. And this is the most common form of learning. Even AlphaGo required supervised learning initially. Then later on, they incorporated reinforcement learning. In unsupervised learning, it's still an evolving field. So right now, in unsupervised learning, if I don't have a historical labeled data, I try to predict still unseen data. It's mostly called clustering. It is mostly used for clustering. Then comes reinforcement learning. And reinforcement learning is quite hot these days, if you know Google has been using it. And I think in past meetups, there was one person who presented on reinforcement learning to learn the stock strategies. So in this one, what I want is I provide a reward function, and then I train the program to optimize that reward function. And based on that, then it chooses the strategy. So like a child learns about the work, we make our programs learn through reward functions. I think many of you might have read some literature on machine learning and a lot of things on this one. In my view, when I started my journey, I started my journey four years ago. I did not start with machine learning. My problem was my wife needed to organize one Asia Pacific conference. And she had one problem where she needed to make 1,100 people sit in one table in such a way that there is a cross department people. And for three days, no two person should sit together. That was the problem I needed to solve. They could solve it on an Excel file by arranging manually. There were seven people involved in this one. It would have taken them seven to eight days to do this. So what I did is I said, OK, I can help you with that one. What I did is I modeled the problem as a traveling salesman problem, prepared a matrix, and then I looked for algorithm to solve the traveling salesman problem. And the most common one is ant colony optimization. And the second one was simulated annealing. And I used simulated annealing because my data was a discrete data. The output of that one was an Excel file, which was not very good. I tell you, still she need to spend four hours to fine tune it. But instead of seven days, it's four, five hours of job of one person. So that's how I got interested in scikit, the Python library for scientific learning, actually, machine learning. Now, if you know basic machine learning, this are the following components of machine learning, any kind. And it covers up all gamut of machine learning and some part of unsupervised learning. First part is a performance component. What is a performance component? You know, in my case, I want to predict a rental price based on a house location and the square foot. That was my problem, right? So that will become my performance component. Now, target function. As I said to you that I define relationship between this and the existing data, how my rentals. Because I have the old rental data from Urban Railway and actual data from them. So how I use find a relationship between these parameters in this one, define a function. Then I use a training data. So I had a training data from Urban Renewal Authority, which I used. And using that data, I applied the function and tried to do this. Then the data representation. How to represent data? When I go into actual detail, I will tell you. But actually, in this problem, I needed to represent data as a matrix, as NumPy arrays. Are you familiar with NumPy arrays? So actually here, I represented it as a pandas data frame. And then from pandas data frame, I converted it into NumPy array and then applied it into the algorithm. That's what. Now, in any machine learning problem, the first five components are designed by human. So you can never take away the human component from machine learning. For AlphaGo, they needed 30 million sets of data and seven months of work by 30 top-notch scientists just to prepare the first version. So it is a lot of work, actually. Hypothesis space, the last part, is the hypothesis space where I'm trying to predict. So suppose in case of Google, they prepared a program to predict strategy for AlphaGo. The same program cannot be used for chess because it only solves the specific problem of hypothesis space. This was basically a brief introduction of machine learning. Rather than going into more theoretical aspect, I will go more towards the practical aspect of how I did it, right? Now, my problem statement I have already told you that given a URA data, I just wanted to fiddle with it and try to find out what will be the rent for a house given a certain location and this one. But before coming to that, my question was why Python? So first I will answer this question directly by this is the main reason why I always choose Python, the Zen of Python. It fits my philosophy, it fits my brain. Not every library is exactly following it, but this is one of the reasons I like it. I like beautiful is better than ugly, explicit is better than implicit, and I think all of you are aware, the people who work with Python should be aware of this Zen of Python. This is one reason. The second reason I choose it is there were very excellent data science libraries in Python, especially for data manipulation, like pandas, scikit-learn, and then they launched TensorFlow and Keras. Initially, when I started TensorFlow, I didn't know, I did not know all of this. I only knew scikit because I was looking for simulated annealing algorithm and that's how I started with it. The main problem, when you start with machine learning, the basic problem comes up is how to approach it. Now, my problem was I gave up my computer science like 19, 20 years ago, and since then being in a corporate job, I never need to do low-level mathematics. Now, going into machine learning, everybody faces the same problem, where to start and how to start. There are a lot of tutorials available, a lot of information available. I did one thing. The first thing is when I was solving that problem for traveling salesman, I realized one thing. If I can model a problem as a linear algebra matrix problem and define a result in a matrix, I can submit this data to the algorithm and then it will run and give me the results. So I brush up on my linear algebra a little bit using Khan's Academy and there was one course on edX. Very simple, very high-level, high-school level mathematics. I still do not understand all the internal details of mathematical details of each and every algorithm. So I still experiment with the models, I experiment with the data and see if it fits my problem, right? Now here, my past problem was this one. This time, my problem was that I needed to change the home. And there was a data. There are two websites in Singapore, PropertyGuru and 99.co. They are available where you can find all the available demand, which rental places are available, right? The problem is they are, the agents vary a lot, sometimes 30 to 40% from the actual rental data from URA, right? So I wanted to say whether what he's asking is reasonable or not. I wanted to do that. One easy way is I can go to URA website, see the property data and I can find out. So I go to URA data. Let me first show you the data. This is the, can everyone see this? This is actually URA data, property data. I do search by postal district. I do, because I live in district 15. So I'll just search for district 15 and here you can see. Get a connection, but okay. I can give you the raw data. I already have the raw data. I downloaded it. So actually this is the data. You get the, from the URA. What data you get is serial number, building project name, street name, postal district type, number of bedroom, but for non-landed only. And since I was looking for landed property, I don't get this data. I just get N-A-N-A-N-A. If you explore the data, then monthly gross rental and floor area, square, and lease commencement date. So this is the data I get from URA, right? So the first is I explore the data, right? So for exploring the data, I will use Python's pandas data toolkit. So import numpy, import path, import math, import pandas, and SQL. SQL learn is scikit-learn library of this one. What did I do here is just I have URA-landed data set, which I prepared and I call it as testing data because I combine multiple files. Oh, yes, yes, yes, yes. This is just a simple one. I was just looking at the data. So I first loaded the data into pandas data frame. It's a very simple. You go here, use the read CSV, and I just read CSV, loaded, and then I printed the first 10 head of the data. Now, when you want to define a machine learning model, first you need to find the relationship between what you want to predict and what you want to use for those predictions, right? There has to be certain relationship between them, right, core relationship. And sometimes the lesser, the number of features, the better it is. So now what did I do it is from the data, right? I removed the time dimension because I was not interested in time. Time was actually month by month data. That was month by month data. I removed the time and what did I do is this? Since I was just interested in building project name, street name, postal district type, floor area and monthly content. So what did I do it is I take the data and then created a new data set which contains only one, two, three, four, five, and six. So it created like an Excel file. If you take an analogy of an Excel file, from Excel file, I deleted few columns. I did some, this one, the problem which is there is that you need to clean up the data of it because the data you will get will never be cleaned up. There will be some cleaning job you need to do and sometimes actually 80% of your time goes in fitting the data properly and preparing the data. Only 20% of the time actually you spend on really testing with the algorithm and working with the algorithm. Right, so in this case when I got the data, I found out that postal district was actually a value which is a number, right? But actually it's a categorical data. District 15, 10, 12, 11, it's not a numerical data. Now, so first thing is I converted that number into a string. That's what I did here in the second part. And then I just described the data. Described, so now you can see here in the description actually it's on the upper part it is printing those 10, this one. And here it is giving you the 10. This is for district 15. So actually once you load the data into Pandas data frame you can do a lot of different analysis. You can generate a lot of different graphs, a lot of different this one just by single command. So here I just write the command your data.describe, right? It will describe me the data and it will automatically figure out which one is the numerical data, what is the standard mean, top 25%, 50%, 75% and max. So if you look at it, the maximum rental in Singapore in that was 54,000 per month. And below you see here is actually for district 15, the highest is 41,000, mean is 5,866. Because if you look at landed property in East Coast they are quite big actually. So this is just to explore the data, right? Now I wanted to apply this data for, I have got the data, right? I have analyzed the data. Now I need to prepare the data for the machine learning algorithms which are provided by Scikit-learn. Now how many of you are aware of how machine learning works? It's actually machine learning is not some magic. It's very simple actually that you have a function fx, right? And this function has many, many variables, right? You give some variable and you will get a result, right? But the problem is normally what you do it is that function can capture only single plane, x and y or x, y, z. But in machine learning through computational statistics I try to create a function which can give me the result which is very near to the actual result. And that's what they do. They optimize the function. Obviously there are a lot of complex mathematics involved but in reality it's just purely. I represent my feature data as a matrix. All the feature which I need to give it as an input then here is my result set. I pass on both of them to my algorithm and then for unseen data my algorithm will give me the result. But if any one of you have read a lot of material on the internet or place what they talk about is they talk about is that you prepare the model you cross validate the model you say what is the standard error of the model? But nobody talks about how can I save this model? How can I convert this model into an API which can be used by others? Because I have seen many, many example and nobody gives you end to end details actually. They only give you all very technical mostly talking about evaluating different algorithms and then giving you this is the result and this is the mean square error this is the square root error actually for that algorithm. So I wanted to go a bit more than this so I have done this one and this one. So now I go through one by one step by step. So in the first part is what I did was I, this is just exploring the data right? Now if you learn in machine learning machine learning requires one thing. Machine learning, oh, machine learning algorithms do not work with categorical data. This is the phenomenal problem right? I only have six columns in my file. How many I have? I only have six columns in my file which represent which postal district it is right? And even the square foot is not actually a correct thing it's a range right? So now my problem was this is a categorical data it is not acceptable for any of the machine learning algorithm they need actually numerical data. For that normally what you do it is either you can write a separate program you can write a separate program to convert your categorical data into a numerical data. How you do it is like this. Suppose I have a data. Postal code 15 monthly rental 6,000. Postal code 10 monthly rental 3,000 right? So this 15 and 12 will become columns and for one the value will become zero and for where it is applicable will become one. So this two value will become actually columnar data right? And this is what you need to do. You can do it programmatically or very good thing about pandas is it provides a very good function here. So what did I do it is? I did it and it did it. Now what I printed below I can show after I run get dummies I print two information for you to see actually how the data looks like. So if you converted to get dummies you can see here for each one monthly gross rental 4,300 7,000 for this 0,0,0,0,0,0 So this data is for actually so this one the second data is actually 4,000 to 4,500 square feet. It's not 4,500 to 5,000 square feet because it's zero. It's only one. So actually get dummies convert your categorical data into zeros and ones and zeros and ones and convert into numerical data. Just that's the step you need to do. And then it's very simple then after that. What did I do was what I'm doing it is basically I got the get dummies. I print two rows to check the data whether it is okay or not. In the third stage what I do it is basically this get data set category data set will be a numpy array. Once I got the numpy array this command will print the shape of the numpy array. How many rows and columns are there in that numpy array? Shape of numpy array. And head 10 will actually print the top 10 data of this URI. And then I also printed the shape of my actual pandas radio frame which I use. So if you look at below actually here is the data. This is my actual data 5297,6. Here if you see it will be, yeah. Here you see two rows 5652 column 5297.650. So as soon as I apply is the function six column become 652 columns. Now in 652 column one set is the result set which is basically actual monthly gross rental, right? So I need to remove this. So what I do it is now once I got the get data set I divided it into two parts x and y. Given this value this will be my result. So I divided it into two parts my data set. Here it's just a numpy array and I just use the numpy array facility to divide it into two parts. I will explain this a bit later, okay? So this I commented out. Now I simply from escaler I imported this elastic net and random forest regressor. Now if your problem is finding a numerical value then you will be using linear regression. If your problem is classification then you will not be using regression. You will be using clustering. K nearest neighbor or this one. The very classical problem for classification would be in spam, you spam and not spam. So you classify your mess into two parts. Basically that is a very classical classification problem. This is basically a regression problem because I want to find a numerical value here. So for them this are the two algorithms. Now what I wanted to do is basically I wanted to check which algorithm is actually good for this set of data which gives me least errors and this one, right? I want to evaluate before going further into details, right? So what did I do it is basically I define the models. Then what did I do it is I use scikit cross well score. I score. So what does it is basically is it divides the data into 10 parts, one part result, one part test and remaining part test and train data. So nine parts I use for training, one part I use for testing. And then it does it again and again, again and again, again and again. I define all those parameters. Cross validation means what it does is my data is 5297, correct? It will divide this into 10 parts, right? Now it will try to train the algorithm with the nine parts and then test against predicting the 10th part. Now the question comes up is it chooses first nine parts then it chooses the 10th part. It automatically divides it. I don't need to do it manually. If I'm writing all the machine learning by myself, I will write all this manually. But since Python provides all those facilities and all those libraries, I don't need to write all those code. I just use cross well function and run it. So if you see it, if I run it here, this is all the data and now you see. You see elastic net had average error 3388. Means 3,388 Singapore dollar error on an average, okay? Here it comes with random forest, 2095. So in your view, which one will be better? Random forest, right? And then what did I do it is, what did I do in the next part is, I don't go to the next part. I said you that I set this standard a scalar. You have 10 sets, right? Suppose you have an equation, 2x plus y equal to some result, right? 2x plus y equal to some result, right? X is 500 and y is only 10. X is 500 and y is only 10. So which will influence the final result more, x or y? But then a small variation will x will impact my result very heavily. So in order to scale the data to similar scale levels, I use this function standard scalar. It automatically converts the data into similar scale. And in, you know, Panda, actually if you look at scikit-learn, there are two kind of scalar operations you can do. One is the Robo Scalar, another is Standard Scalar. Now I show you, you saw this actually just now. If you look at it, average error is 338, right? Make the data standard scalar. You saw direct impact. My mean square error has reduced dramatically. This is how you need to do a lot of explorations and work to actually figure out which algorithm is best for me, okay? Now, okay, I have defined my algorithm, I have tested, okay, I say my elastic net is also good, random forest is also good, right? Now the problem comes up is I got the model. Now how can I use this model to predict my value, right? So here in scikit, all models provide you certain, some functions, right? There is a dot fit. Dot fit is basically used for training my model, okay? Once I have trained my model, I can use, if you look at here, right? I can use model dot predict to predict unseen value, and that is what I'm doing actually. So I take the URA data, I convert it into matrix, right? I do some transformation on the data, I run model dot fit, and then with model dot predict, I can actually predict it. But one thing you need to remember, whether you do model dot predict or model dot fit, in the older time, they can accept 1D array. They can accept 1D array. 1D array means what? Like, suppose you got district 15 from the user as an input. In district 15, 1500 square feet is what will be the price? What you need to do internally is convert this into a NumPy array, a data on which you need to predict, and then it will predict, actually. And here, what I'm doing it basically is, in the old version, actually it could, it could accept the 1D array. In the new version, they don't accept 1D array. So actually I need to just use the reshape, dot reshape, to give it the data. And actually you can see, currently the algorithm performs like this, if you look at the data. Actual data, here. Actually, average error is this one. Actually the actual original rental is 4,300. I make it bigger, if you look at it. Now you can see behind, right? Okay, so here if you can see, I just printed it for, basically I didn't have the time to finish the whole web front end and create an API, but it's very easy to do, once you have done this job, actually. Here, the actual rental was 4,300. The one predicted by my program is 4957. For random forest, it is 4,700, a margin of 10% error. I did not spend a lot of time in training and working with all the data, but you can have a fairly good predictions through this. Now the problem comes up is, you have trained the data, right? You have done this data. What you want to do is, you want to save this data, save this model, right? You have trained the model, you want to save this model, and then later on in another program, you just want to load the model, use it to predict, correct? That's the problem you would like to do, so that's something you can do very easily. Escaler, they provide JobLib. I just do one function, JobLib.dump, model one, ElasticNet model, just give the name of the whatever model you want to save. So it will save three files. One is actually the model information and the respective training data which was used to train this model. It saves together inside this one on your disk. Then what I do it is, okay, I show you here, is actually here. Now, I just give one example where I'm loading the random forest model from the saved model, JobLib.load, and then trying to predict using that model. The results are pretty quick. It doesn't need to go through training stage. So here you can see, and actually if you look at this result and the upper result is almost the same because it's the same model, I just loaded it. What you can do it is, in your practical way, when you work with building such kind of machine learning programs, what you will do it is, you will take your data, train your model, save your model, and then you will expose it as a web service. In the web service, you only get the input you want from the user. Now, one problem comes up is, I had six data, right? I have six parts. You can take away the gross floor, the actual rental. So there are five parts, right? Postal code, and building name, and the street name, correct, and the square foot, correct? These are the four parameters, five parameters I have. Now, users say, I only know district, in district, what are the price? If you want to solve this problem, how would you approach it? Can anybody answer? Actually, if you want to solve this problem, then actually, your problem is another problem. Because what you trained on is actually five features. You cannot get one feature and then predict. If you want to predict only on one feature, then you need to train your model on one feature. So actually, you will have seven or eight models behind, saved in your disk. Depending on the user query, you will invoke that specific model to predict. This is something, basically, I learned it through trial and error, because I did not find any blog post or anything explaining this one. And this was just for fun, you know, it's for my own home. So I did not use it anywhere. This was just for my own personal interest, I did it. Okay, this is done. Now, if I train it more, because I just use a very limited set of data, I did not optimize the data yet. But if I do that, the prediction will be more nearer. Currently, I took only 5,000 sets of data. And actually, URA data is around 200,000. Correct, correct, correct, correct. So it can be corrected, actually. I had to look for the exact parameter. My feeling here is, because the problem is, because you know the square foot. Square foot impacts the data very well, but I'm using it as a categorical data. You understand, in the prediction, why there is a positive bias is because if I have actual square foot data, right, it's a numerical value. Right now, my information, if I go through realists, and actually I subscribe to URA data realists and I get the data, I think my prediction will be more on both the sides. But you're right, there is mostly a positive bias, not negative bias. But if I change the training parameters, there was a negative bias. But then the MSC, MSC was higher. Mean square error was very high. So I change it to make it reasonably. I think negative bias is here, on the higher values. Here, here, here. Here there is a negative bias. So in sum, I find there is a negative bias, in sum there is a positive bias, but actual error is, as I said you, 2400. Okay, I worked with this one. So what did I do? It is I took the data, I converted into metrics, I dumbed the model, I used the model to predict. In this one, I use very traditional linear regression algorithms. There, I was just applying two. I did not use scalable vector machines, actually, SVR, SVR algorithm, because they took me at least 20 hours to finish one round. So I just didn't do it actually, exactly. Elastic net and random forest were pretty quick. And random forest give me reasonably good results, so I just relied on random forest to do my experiment. Now comes back is, then I was thinking, can I apply deep learning to this linear regression problem for housing data and try to see whether it works or is it better than my traditional approach or not. Since, again, this is again a demo. So I did not do a lot of time running on the epochs and all those optimizations which are needed to be done on TensorFlow to get the results, but I try to do it in a very simple way. How many of you are aware of deep learning? How does it work? Deep learning is actually a special branch of machine learning where you try to model your problem using similar structure as a brain. But actually what you do it is you have a simple function and you try to optimize that simple function and pass it on to layer from layer and try to optimize that function and then predict based on this. There is quite a lot of, you know, still we don't have a very sound mathematical theory proving why deep learning works. We do have some proofs, some explanations, some details, but we still in some areas we don't understand how it works or actually why it works. Actually the mathematical proof for linear regression and everything we have a pure mathematical proof how it works, how the data come, how we do a train data and we get the values. Anyways, let me reduce it here. So for this, I used a very simple way. Actually I'm not so proficient in deep learning so I looked at a lot of materials and then I find two or three tutorials which were talking about deep learning. They were quite dense so I just use a very simple Keras library. I use Keras documentation and based on this then I developed the model. Now, as I said you, in this one, this is my model. This is my neural net model in which I try to compile and I do mean squared error. This activation ReLU, kernel is a normal and input dimension 60, 51. So what I'm trying to do it is I create a sequential is a built-in facility by Keras to build your neural net. In this one, it's a simple neural network in which it's a simple one layer, one layer neural network where I define this dense function 651. Now 651 is actually the number of columns. So none input dimension. So my input dimensions are 651. Actually this one would be how many times I'm trying to use it to activate that function. This defines, if I make it 800, it will become a deep network topology. Sorry, wider topology. If I do 651, then in another layer I do 300, then I do 200, then it will become deep layer topology. In this one, activation function ReLU is actually defined in deep learning where you try to optimize this function. I have an example where in one of the blog, I was looking at an example in which they visually show how it optimizes the values. It tries to optimize the mean square error and the optimizer it uses is Adam. There is another optimizer which is called SGD, Stochastic Gradient Dissent. Obviously I don't understand all the mathematics behind it so you can consider me as just a user of this but I experiment with both of them to try to find out which will do a better prediction. So here I define my model, baseline model. And this is a single layer model. I can define deeper in other, they were there, I removed them because they take very long time to run. So model one is a baseline model. So I defined it as a function and then invoked it, right? Then what I did is x dot shape. This I defined earlier, x and y, if you know. Then what I did is just model one dot fit. The way to train a deep learning model is similar to the way you do machine learning model. Model dot fit, x comma y, epochs. Epochs means how many times I will be running it and what will be the batch size of the data it will take every time it runs. You need to experiment with these two to see which one will give you the most optimal result. You need to run it a lot of times to see what results it will give. So if you do 10 or 20, your result will be more varied. If you do 1000, 2000, then it will be better. But it takes pretty long time on my MacBook. So actually for the demo, I just keep it 150 but a 150 will also take a bit of time. Then what I do it is once I train the model I try to predict the data. But you say I predict the data, I did not test whether this model gives you what is the mean square error of this model in prediction. That's how you evaluate. So actually your first stage will be to evaluate whether your model gives the lowest mean square error and then you modify your neural net again and again to make sure that it gives you least MSC. In this one, if I just do it, so it is using a TensorFlow backend by default. This is just printing the model details. This is the shape of the array X and Y. It will take a little bit of time to run. Did anybody of here tried using deep learning for linear regression? This was just my interest. This is the first time I also used. Because majority of the deep learning examples are mostly related to RNN or recurrent neural networks or concurrent neural networks talking about image recognition, working on image net data or mostly talking. There was not much examples on this part using it for linear regression. It will print the data, right? Now again, the same thing, right? You should be able to save this data and use the trained data later on, correct? Because you don't want to run it for 20 hours once you have trained the model. For that, actually here it provides, Keras provides a method called dot state. For this to save, you need to, it saves in H5-based format. So this and deep network tensor flow basic model, one dot H5 and it will just store it. And then what did I do? It is I used the same model, loaded it, and then predicted the render. These are the results from using linear regression in deep learning. But here it calculated in front of you. 4808.74, this one. Now this is different from this one, correct? This is different from this one, right? And what I wanted to show you here is actually MSC. MSC is so high in this one because it's not optimized. I didn't do a lot of work on it. In the meantime, it will take a bit of time to run this one, although here if you look at this, now how to evaluate a deep learning model? For this, there is a function in Keras called Keras Regressor. That is what precisely I'm using here, actually. Keras Regressor, right? Now here, I change certain parameters to change, okay? I put epox as five, means only five times it will run. And every time, the batch size will be 1000. So then it finishes quickly. Although it will give me a very large MSC. The lower the batch size, the larger the number of epochs, the lower the MSC error. Mean square error, you can see the error. And this is extended deviation for using deep learning. After experimenting a lot with all these algorithms, for me, I found out that if I take URA data, the best way, the best algorithm so far has been random forest. Very reliable, very predictable results. At least I understand why and how it is doing it. Here, I save the data right upper side. Now, I load this data and run it. I'm about to finish, actually, for this one. Okay, here is the, in this data, you must have missed something. In this model, actually, you can't just take the URA data. Actually, you can combine URA data with the external data. If you know exactly the data and you can get the data at a similar time scale as your actual data. Give you an example, which I was thinking when I was working with this problem was, if you look at the housing rental or housing purchase in Singapore, the rental market is not driven by Singapore locals. Because here, the property ownership percent is actually 87.9, which is pretty high for a developed country. So here, the rental market is actually driven by either the PRs or the EPs or WP issued by the Singapore government, right? So if from MOM, I can take the EP data, right, and use it to train my model, my rental predictions will be a bit more aligned and will be able to predict based on how many number of issued pass, what will be the actual rental for that place. That's one. Second thing, purchase data in Singapore, property purchase is influenced by two factor. One is cyber, Singapore interbanking rate, which I can get the data, incorporate as one of the feature in this one, and then add this to my existing data as one of the columns, right? And then that becomes part of my machine learning. So that parameter will be covered up and cyber definitely impact the purchases of properties in the future. Because at present, this month, I change my cyber or there is a minor variation in the cyber, two months later, it will impact my property. So, you know, actually the thing is what I learned is that before coming into machine learning, the first thing you need to define it is the problem you want to solve. Then second, see what data is available for this problem, for you to work with, right? How this data correlate to the actual result you want to get it. You don't have to have a complete idea. As I said, you don't have to have 100% idea because as you can see, I can run various algorithms, I can evaluate the efficacy of an algorithm by seen data and unseen data. And the more data I get, the better the results get. I think this is it for my presentation and this is what I wanted to cover up today. Hope you find it useful. Obviously, in my next, I will try to develop a web service based on it and a user interface. So maybe next time, then I can cover up that part. But for time being, this is, any questions?