 Greetings fellow learners! This is going to be the first video in a playlist of videos to get up to speed about deep learning and neural networks. But before we embark on our journey, I have a thought-provoking question for you. What is the most important AI in your life? Or phrased in another way, what AI do you use on a daily basis? Is it chat GPT? Is it the self-driving car feature in your Tesla? Or is it perhaps something more inconspicuous? Please comment down below and I would love to hear your thoughts. Now, this video is going to be divided into three passes. In the first pass, we're going to go through an overview of what neural networks are. In the second pass, we're going to dive into some concepts. And in the third pass, we will code out your first neural network in PyTorch. Also, pay attention because I'm going to quiz you along the way. Now, let's get to it. What is the difference between AI and neural networks? AI is a system that can perform one or more tasks. And to solve these tasks, AI systems can be of different types. Neural networks are a type of AI system that was inspired by neurons in the brain. Now, other types of AI systems could be role-based systems, machine learning models, among many others. But overall, these AI systems can take an input and return an output. So in a mathematical sense, AI systems are functions. Now, because they are functions, they can solve tasks of the form determine Y given X. So that means that they can solve, for example, any of these tasks. Now, these are still just a drop in the bucket of all the tasks that AI can solve. But I think that this is a good starting point. Quiz time! Have you been paying attention? Let's quiz you to find out. Which of the following tasks can be solved using neural networks? A. Generate a response given a question. B. Determine a credit card score given a person's transaction history. C. Determine a home price given home information. Or D. All of the above. Now comment your answer down below and let's have a discussion. And at this point, if you think I deserve it, please do consider giving this video a like because it will really help me out. That'll do it for quiz time for now, but pay attention because I will be back to quiz you. Now, in this second pass, I want to describe different concepts in neural networks, starting with their architecture. So this is a diagram of a neural network. Each circle is a neuron that has some inputs and outputs. And each edge connection connects these neurons and has some weight, which is some scalar value. Now, we mentioned before that the entire neural network is a function because it has an input and an output. But because each neuron also has an input and an output, then each neuron is also a function. But what is this neuron function? Well, it's an activation function. So moving on to activation functions now. So here's a diagram for neuron. It has inputs A, B, and C and weights W1, W2, W3. The neuron function is a function of some of products of inputs and their edge weights. And this function is an activation function, and it can typically be one of these. So why do we need activation functions? It's so that the neuron and also hence the neural network can pick up on complex patterns in data. And that's how neural networks can solve complex tasks. Now on to the next concept, which is a loss function. The loss function is a function that quantifies how erroneous the prediction of a neural network is for a batch of examples. So the loss function is a function. That means it has inputs and an output. The input to the loss function is typically two parameters. The first is the prediction or the value that is predicted by the neural network. And the second is the actual ground truth value. And the output is a single number that quantifies this error. This error is a loss. So the loss function can be, for example, cross entropy loss for classification problems or mean squared error for regression type problems. But this is non exhaustive, and it can be almost anything that you decide to formulate it as. And this is kind of the role of a machine learning engineer to try to sometimes engineer these loss functions depending on the task to solve. Next concept is back propagation. So back propagation involves taking the loss and computing the gradient of the loss with respect to the parameters of the model. This gradient is computed from the last layer of the network to the first layer of the network. And hence it is called the back propagation of errors or in just simpler terms back propagation. Now, why do we do this? So let's say we want to train a neural network to predict whether a given image is either a dog or not a dog. The neural network's intelligence is all in the edge weights here, the model parameters. They are randomized initially, and so it's a dumb neural network. And we want to modify these edge weights so that they can perform the task. And this is why we need to train the model and back propagation is a part of that training process. It's used to adjust the weights of the network so that the loss is minimized. And in the math world, this requires taking a derivative of the loss with respect to the model parameters, and hence back propagation is required. Next, let's move on to optimizers. Optimizers define how neural networks learn. So first of all with back propagation, we computed the gradients of the loss with respect to the parameters. Now optimizers will use this gradient in order to update the parameters of the network. So they're like an algorithm that determines the update rule. Any one of these here can be the optimizers, and we typically use Adam. Now let's talk about regularization in neural networks. Sometimes AI systems can overanalyze training data to the point where they start memorizing that data, and they won't be good at predicting on unseen data. This is overfitting, and regularization is a technique to decrease overfitting. In neural networks specifically, one way to regularize is through a technique called dropout. Dropout involves randomly turning off neurons in the network during the training process so that the neural network can learn along different paths, mitigating memorization and promoting generalization. It's that time of video again, have you been paying attention? Let's quiz you to find out. What is the primary purpose of the loss function in a neural network? A, to initialize the weights of the network. B, to quantify how erroneous the network's predictions are during training. C, to determine the learning rate of the network. Or D, to introduce non-linearity to the network. Comment your answer below, and let's have a discussion. That'll do it for quiz time for now, but do pay attention, because again, I will be back to quiz you. Now for pass three, I wanted to walk through some PyTorch code on how to code a neural network. But first, taking a look at this original image, this is an image that you typically see of a neural network in textbooks and literature, but it has a few problems. So first of all, data here, it almost looks like it's just taking in one input at a time of vector size three. However, in practice, we would pass in data as batches, taking advantage of parallelization in neural networks. And also, this image has the neural network parameters represented as edge weights, but in practice, the edge weights and the model parameters are represented simply as a tensor of values. So keeping this in mind, it'll become more clear when we see this in code. So we can begin by importing some libraries, and we're going to work with the iris dataset, which is basically a classification dataset to determine, depending on the type of sepal and petal lengths, what is the type of flower it is? It's one of three classes. And so we have 150 examples with four features, and there's 150 labels accordingly. We then split this data into test and train sets. We'll convert all of these examples into floating point tensors. We then initialize a batch size to be five. This is because, as mentioned before, data is not just passed one at a time to a neural network, it is passed in batches into a neural network. We then put all of our data into dataset and use data loaders. The goal of these is that dataset provides an abstraction for handling features and labels easily, and data loaders makes the dataset iterable and allows batching and shuffling of data in the dataset. So it becomes more streamlined. So in order to create a neural network, we first need to extend the class torch nn module, and we will then override the constructor function, as well as a function called forward. This will define the forward pass. So in the constructor, we're going to define individual components of our neural network. So we define the first layer, the activation, in this case, we're using relu, and then the output layer. And for the forward pass, we basically take our input X, we pass it into the first layer, then pass it into the relu layer, pass it into the second forward layer, and then we have the output here. We then initialize hyperparameters, depending on the dataset that we have. For example, we have an input size of four because we have four features, the hidden layers, there's going to be six neurons in the hidden layer, and number of classes is three because there are three categorizations right here. We'll initialize a learning rate, which defines how fast we want this network to learn. Then we define number of epochs, which is how many times are we going to look at all 120 examples in our training set in order to learn. And then we initialize our neural network accordingly. Now we'll determine a loss function. Now, because like I mentioned, that this is a classification task, a common loss function is the cross entropy loss. And so we initialize that here. Next, we define an optimizer. The optimizer defines how the updates to the model parameters are made. And so we use something called the atom optimizer. We pass in the model parameters and an initial learning rate. Next, we actually have the training code. So for each epoch for every batch, we will load one batch of data, that's like five examples from the train loader. We will make a prediction. We'll pass in the prediction along with the actual label into our loss function and generate a loss. This loss is then we perform some back propagation in order to compute the gradients. And then optimizer.step will actually make the updates in the model parameters so that the neural network learns. And we keep repeating this again and again until the 1000 epochs are complete. And we keep printing out the loss over some uniform increments of time to see that the loss is decreasing over time, which means that the model is learning, which is good. Next, we'll do some model inference with the test data where model.eval is set to, hey, we want to set this model into evaluation mode or inference mode. We iterate over the data, make some predictions, determine whether it was correct or not, and then compute the simple accuracy. In this case, we got every single example predicted, correct, which is good. Quiz time. This is going to be a fun one. Have you been paying attention? Let's quiz you to find out. The algorithm neural networks use to update parameters is defined by a, the loss function, b, back propagation, c, optimizers, or d, regularization. Comment your answer below and let's have a discussion. And like I mentioned before, if you do think I deserve it, please do give this video a wonderful like and hit subscribe for more videos like this. That'll do quiz time for now and also for this video, but before we go, let's start with the summary. Now, putting it all together for the training process, we want to train the neural network to recognize dogs from not dogs. We pass in a single image to the network, the network generates a number that indicates the probability of a dog along the way the network might have activation functions to help recognize complex patterns. It could also have dropout to help us prevent overfitting. Now, this prediction along with the actual label is passed to the loss function in order to generate a loss. This loss quantifies the error of the network and we want to adjust the weights of the network to minimize this error. And hence we back propagate this loss to the network and compute the gradients of the loss with respect to each parameter. And then these gradients are used by the optimizer to determine exactly how the neural network weights should be updated. And after rounds of training, the network will be able to take an unseen example and make predictions on it. And that's all we have today. So this just stretches the surface of building neural networks. So I hope it's a good starting point for your neural network journey. Now, if you do think that this was a little bit really surface loving, you wanted to see like a real neural network in action, check out my playlist here where I code out a transformer neural network from scratch for translating between different languages. It's a fun watch. It might be more complicated than this, but it's pretty cool. Thank you all so much for watching. Please do consider giving this video a like if you think I deserve it. And I will see you in the next one. Bye-bye.