 Great, can we prove this general property that this approximation is good? Well, we are assuming that the functions are continuous. One way of doing that is saying that we assume Leipzig's continuity with the constant k. What that means is that if we have x1 and x2, two positions, where we can evaluate the function, then the difference in the output of these functions is smaller than a constant k times the difference between x1 and x2. So, under this definition, the error is then bounded by 1 half k delta x. And keep in mind that as n goes to infinity, delta x goes to zero. And with that logic, we can ultimately prove that this approximation with values is good. Now, let's go to multi-layer perceptrons. What is a multi-layer perceptron? It's just a function, has some input, it has some output. But it's a specific kind of function, which is a function that we can write by successively doing linear transformations and nonlinearities. So, let's look at these components. Not like we have w1, w2, w3 and so on and so forth, which are the weights in those areas. And here, we have a notation where the biases are b1, b2, b3. We thus have, and of course, after each linear transformation, we have the nonlinearity, which we have here as sigma. So, what do we have? We have sigma w1x plus b1. So here, we go here linearly, then we have a sigma. We go here linearly, have a sigma and so on and so forth. This is what we have here. Now, there's specific nomenclature that we use for that, which is, what's a layer? Well, a layer is one of the intermediate vectors. This is a vector, this is a vector, this is a vector, this is a vector, a very short one. What's a neuron? A neuron is one of the elements of those vectors. What's the depth? Depth is the number of layers that we have. What's the width? Width is the dimension of the layer, the length of the vector. What are the weights? The weights are the linear matrices that connect one layer to the next. What are the biases? These are the b's that we add to, and then we have the activation function on nonlinearity, which is the function sigma that we're using here. So, these are the words that we use for relating to this function, and it's a specific function that makes it relatively easy for us to optimize. So, let us design multi-layer perceptrons now. What do we need to initialize? Well, we need to know all the layers, give me the number of layers and the right sizes and the nature of them, and then we will need to define the forward computation. Keep in mind that AutoGrad will give us the backward computations that we need to get to the gradients. So, what do we need? Get me the data in the right format, do a linear-to-layer-to-layer function, and then apply a nonlinear function. So, that's the general logic of MLPs. And of course, that set of linear transformations with nonlinear applications and so forth will be stacked multiple layers with the lost function on top. Now, there are three ways of creating a neural network in PyTorch. The first one is the module way of doing it, then we have the sequential way of doing it, and then we have the module list of doing it. These different ways of defining neural networks make sudden ways of interacting with neural networks easier and others harder. So, it is time for you to learn about them by writing a general function here. So, what we'll now do is we'll build a general function that makes homogeneous MLPs. We will tell it the inputs, we will tell it the hidden units and their sizes of the hidden layers and output size, and of course, which activation functions we'll choose. And thus, we will be able to efficiently construct MLPs and compare different MLPs that are built in slightly different ways. For example, it will allow us to say, well, what would have happened if we had chosen a different activation function? So now, it's upon you to build it.