 As I discussed in the last video, supervised learning involves using input data to predict a class or value that is initially supplied to the algorithm during training. In this video, we'll take a closer look at how a machine learning model makes its predictions. Let's take a look at predicting SAT scores based on GPA, an example of regression, or predicting a continuous numerical output based on an input. As you can imagine, generally speaking, as GPAs increase, SAT scores increase. Let's begin by visualizing this imaginary data set. It does look like our hypothesis was correct. Although the relationship doesn't look totally linear, let's begin with linear regression. Now, you might be wondering how linear regression, which you've likely done in math class for a few years now, has anything to do with machine learning. Well, it's a basic form of regression that can help give an idea of how deep neural networks work. As you know, the equation of a line is y equals mx plus b, where m is the slope and b is the y intercept. By manipulating these two constants, we can create any line in the xy plane. Fundamentally, just like any function, the equation takes an input, manipulates it, and gives an output. So to predict SAT scores based on GPA, we can just play around with the m and b values until we get something that matches up with our data. Of course, usually the model would automatically adjust these values to minimize a value representing the difference between the actual data and the line, which represents the model's predictions. We can call this difference the error. Now, let's step it up a level and play around with this fifth degree polynomial. Again, we manipulate the constants to match the function to the data. We see that this allows us to model a more nuanced relationship. Now our model correctly predicts that changing GPA from 3.9 to 4.0 likely corresponds to a larger SAT score difference than from 3.0 to 3.1. For even more complex relationships, computer scientists use artificial neural networks. Let's see how these work. The basic idea is that by multiplying input values by weights, summing, and spicing things up with nonlinear functions such as a sigmoid, we can approximate any function. Fundamentally, we are still just manipulating input data to get an output. So we can start with the number 3 in the input, perhaps representing GPA. This is multiplied by each of the weights, and then we plug each of these values into the sigmoid function. Then we multiply these numbers by weights again and add them up. The number that we get is the output, just like with linear or polynomial regression. Note that we could just as easily start with multiple inputs, representing multiple characteristics of some data point. In our example, perhaps we input both GPA and PSAT scores. In function notation, this would look like f of x and y, and with a neural network, it would look like this. We can also easily perform classification instead of regression. One easy way to do this is to simply take our output and plug it into a sigmoid function, which will return a number from 0 to 1. If the output is above 0.5, then we can say that the data point belongs to the first class. And if it is below 0.5, then we can say it does not belong to the first class. With linear regression, we adjust m, the slope, and b, the y-intercept, to minimize the difference between the actual data and the prediction. Similarly, with polynomials, we adjust the constants, and with neural networks, we adjust the weights to minimize this error. Fundamentally, training just means making small adjustments to a model's internal representations to minimize error. In the case of neural networks, a bit of calculus is used to repeatedly adjust weights by a little bit to minimize error. So, if we know that at a certain weight value, increasing the weight by a little will reduce error, and decreasing the weight by a little will increase error, we can slightly increase the weight to decrease error. Calculus allows us to do exactly this by finding the derivative of the error with respect to a particular weight. Essentially, we find how the error varies when we change the value of a weight by a tiny amount, and use this information to update the weight. After training the constants, or weights, express the patterns found in the data. Once you understand the basics of neural networks, it is easy to help neural networks find more complex relationships, at least to a point. The idea is that we can just add more layers to the neural network, which is also known as making it deeper. This means that the neural network will do more multiplications, additions, and non-linearizations. The neural network is trained in exactly the same way as before, and now it can potentially model more complex functions. Hopefully, you now have a basic understanding of how internal representations are just the way that models can get from input to output. And by adjusting the internal representations to minimize error, the models can find all sorts of patterns. In the next video, we will be taking a closer look at how models are trained.