 In this segment we will see how to use the NNGraph package to build our current neural network in Torch. We will start by comparing the technique to create a multi-layer perceptron with the NNGraph package and the NNGraph package. So we will introduce the parentheses and the dash notation alternatives. Moreover, we will show how to use the graph.visualization with the GraphVids application. We will write a getNode() function, additional to what comes with the g-module in order to display the content of a specific node. Then we will see more fun architectures that can be easily done with the NNGraph package. And we will implement our very own vanilla recurring neural network in just three or four lines of code. Finally, we will use a script we have developed here at the eLab at Purdue, which can create a recurring neural network model and its time index replica with some annotation that is the understanding of its own architecture. Let's see where we can find the NNGraph package and how to install the GraphVids program. We are going to have some fun with the NNGraph package of Torch, which is a graph package to craft neural networks. You may need to install the GraphVids utility on Mac or Ubuntu. If you would like to visualize also the graphs we are going to make. Let's start our interpreter and require NNGraph. We are also going to require pretty NN. However, we are going to set the Torch manual seed to zero so that the results that I make can be reproducible by you. Let's create a simple neural network. So we are going to have a sequential. Then I'm going to add a linear that goes, for example, from 20 to 10. Then we can add a nonlinearity. Then we can add one more linear. In this case, let's say from 10 to 10. And then we can add one more nonlinearity. Finally, we can add our last layer. So in order to write a network which has three layers, I had to write six lines of code. We can print here our network. Let's see how we can do the same by using the NNGraph module instead. Our first module is going to be an handle on the first module of our network that we had generated here. And we add an extra parenthesis here. So what is H1? H1 now is a NNGraph node. If you would like to look inside H1, we can use the curly bracket operator. And then we can see the content of our H1. We can recognize that inside data and module, we will see the first module of our previous network. Let's actually check if it's the same module. So we can do, for example, from the previous net, we can get the first module dot bias. Let's view it in one line so it's easier to view. And let's do the same for the other one. So H1, we go inside data module. Then we go to the bias and we view it the same way. And here we can see the values match. So let's finish to write this network. So we had that H1 was simply the first module of our network we had generated before. And then let's call H2 the rest of the module. So we are going to apply net module 2 on H1. Then we are going to apply net module 3 on this combination. Then we are going to apply net modules 4 to this combination. And then finally, we are going to apply net modules 5 on this. And here we have H2. It's the second node of the graph, which has been built from H1 to which we have applied the module 2 function, the module 3 function, the module 4 function, and the module 5 function. So in two lines of code, I have my network now. Let's create the graph net, which is going to be a G module to which I pass my H1 as the input, and the H2 as the output. So if you'd like to visualize this, we can do graph dot dot, G net. Then we specify forward graph. And we can call this MLP. Let's open it, open MLP dot SVG. And here we have it. So we have our first input, which is mapped to a linear. Then we have a tanh, another linear, a tanh, a last linear, and then there is the output of our module. Okay, so let's try to forward now an input to both networks. So we can write x equal torch dot ran and 20, because our first linear was from 20 to 10. And then we can see net, which is our sequential network. We can do net forward x, and we get 0.1856. And then we can try G net forward x, and we have the same exact result. Let's say we would like to check the content of the node number 4. So I like to usually write this handy function. So I usually write a function. Perhaps you can do it another way, but I'm not aware at the moment. So let's add one more function, get node, specifically my ID. So I have 4 whatever and node in, I pairs, self, forward nodes. Please do. If n dot ID, it's equal my ID, then return n data module and n of the 4. And then we do return nil, if you didn't find anything, and then we can end. So now if I get my G net, and I ask get node number 4, that is the one we were looking for, I get my linear. And if you would like to check inside of, as usual, we can use the curly brackets. So can we make it more easy? Yes, we can. So let's see how to do so. Let's forget actually we had this network here. Let's make a new network from scratch, but with the same architecture, but without sharing specifically the same modules. So I'm gonna write my first handle. I'm gonna call it G1 and I can write it this way. It's a linear that goes from, it's the first linear, so 20 to 10. And then I write my second module, my second handle, which is going to be G1 attach to nn, attach to nn.linear that goes from 10 to 10. And then attach to 10h again. And then finally our last linear, which goes from 10 to 1. And then I can put those two guys together. And we have our MLP equal nn.gModule of my first G1, which is the first handle. And then the G2, which is the second handle. Let's visualize this guy for fun. So let's go on graph. We have MLP forward graph. And we call it MLP2 and MLP2. And then we can open it. MLP2.sv. And here it is. So our second module is basically the same as the first one. And it's the way to write this down. It's much more compact respect to the classical sequential. Is this the only advantage? No. Actually, the fun just starts now. We can make some interesting architecture by using the nn.graph package. For example, let's play with this. Let's call it input. It's going to be my first handle identity. So then we have our first block. I'm going to call it input to which I attach a linear 10 to 20. And then a nonlinearity. My L tool instead is going to be the input here above. Attach to the output of L1 together with joint. And then we say make a linear. So since my input was 10 and the output of the first one is 20, we will be going from 30. And let's say we go to 60. And then we apply a nonlinear function. Then we have our third block. It's, let's say, the concatenation of the first block and the second block. So we do again a joint table on the first dimension. Here we can make a linear. So let's see. L1 was 20 and L2 is 60. So 20 plus 60 is going to be from 80 here. And let's go onto a scalar. And then we apply a nonlinear function. And let's make our g-module. So the first one is going to be my input. And then the output, just the L3. Let's see how this looks like. So we can go graph dot g dot forward graph. Let's call it fancy. Fancy. And open the fancy dot SVG. And here we have it. So let's see. We start here with the initial node. Then we have an identity which is sent to two parts because we see here that the input was used by L1 and also by L2. So on the left hand side we have L1 and on the node 10 we have L2. Then L2 is processing. And then we have node 4 which is going to be encoding our L3 which gets the input from the output of L1 and L2. And in fact here we get the last tanh, now node 5 from L1 and the output of the L2 which is again another tanh which is going to be sent to the node 4 which is joining the tables and performing the last linear. Now that we have sharpened our weapons we can start tackling the RNN. So let's say we have the dimensionality of the input. It's equal to 3. Then we have the output dimensionality it's equal to 1. The dimensionality of the hidden layers it's 5. Then the number of hidden layers. Let's say we have two hidden layers in this case. And then we have the length of the longest sequence we would like to tackle. Let's say it's 4. We are going to make now our model. We have to define some specific inputs and some specific outputs. So to this model I will have three inputs. The first one is going to be called xx. I'm going to explain what it is. The second one is going to be called hh1. And then the third one is going to be hh2. xx is a terminal for my input sequence. So this one is going to be going here. Then the hh1 it's simply the h of the first layer at time t-1. And this one is fed here. And the last one is going to be the previous h for the layer number 2. Add the time t-1. And this one is going to be fed here. On the right hand side we are going to be producing the first hidden representation which is going to be called simply h1. We output also the second representation in order to be able to use it for the following steps in the sequence. And then we provide the yy output. This one corresponds to my h first layer of time t. The second one instead is going to be a gate for my h second layer. Again time interval t. And then the last one I will have my prediction sequence. So let's make now this central block with all these terminers. We are going to have three input and three output. xx equal nn identity. hh1 equal dash nn dot identity. And then hh2 equal dash nn identity. Then we have the h1 is going to be xx and hh1 the previous state to which I do a join table on the first dimensionality. Then I have a linear layer going from n plus t to d. And then we have a tanh nonlinearity. Then here we have our h1 concatenated to hh2. Then we have the same join table 1 and n dot linear which goes from 2 times dd nonlinearity. And then the output is going to be h2 connected to linear d2k. And then we have nn tanh. So our RNN is going to be equal to nn dot g module of xx, hh1, hh2. And as output we send h1, h2 and yy. Sweet. Let's try to forward a node here. So we are going to have my x is going to be torch rand of n. And then we are going to have that h0 the initial state. We said it's going to be zeros of the dimension. So if I do RNN forward and then I send my x a zero state and a zero state because it's for example the first element of my sequence. We are going to have the output which is going to be current h1, h2 and then the y output. Let's visualize this network. So we can do dot RNN dot forward graph. And then we have RNN and RNN. And we can open RNN dot SVG. And we have a RNN written in four lines of code. So let's see what's happening here. So at the beginning we have our input that is going to be the three state inserts. The three dimensional input x and the two previous states. 10-sur-5, 10-sur-5. The node 10 we have the identity connected to the x. Node 11 we have the identity connected to the previous state for the h1. And node 12 we are going to have the identity which is connected to the previous state of the second hidden layer. Then we compute a concatenation of the input and the previous state in the node 8. And then we perform the linear mapping to which we apply then the tanh nonlinearity. Then the output. It's first send down for the next iteration. And also it's send to the next block which is going to be having an input which is 10-sur-5, 10-sur-5 because both of them are the dimensionality of D. We join the table so we are going to have 10 which shoots to 5. And then below we have a tanh and then h2 is sent again out for the next state, for the next iteration. Finally we have the last linear and nonlinearity. And also that is going to be sent out. In fact as we saw here when we send forward our x and 2 null states we are going to have 3 outputs which is going to be the first hidden layer, the second hidden layer and then the final output. Perhaps it was a bit confusing so let's make it easier by using a script we have developed in our lab. rnn equal require rnn. You can get this file if you go on the elabgithab account, find the torch7 demos and then on the rnn train sample you can find the rnn.lua script I'm using right now. So we can say time net and net equal rnn.getModel and we send the dimensionality of the input and the dimensionality of the hidden states D, the number of hidden layers which is going to be 2 in this case, the k dimensionality of the output and the maximum length of the sequence. Done, finished. So one line and we have everything we have written before in the previous screen plus also the cloning and with parameter sharing across time steps in order to perform a training which we will go across in the next lesson. So let's see what we have generated here. So we go graph dot dot net dot forward graph and we have net net. We can open net dot SVG. So we have the input, the node number 10 and it's sent together with the node 11, previous state for the first hidden value. It's sent to node number 8 where we have them joined together. Then they go through a linear mapping and then we apply the nonlinearity tanh. We can see the linear goes from 8 to 5 because the lowercase n was set to 3 whereas D was set to 5 so 5 plus 3 equal 8. So we go from 8 to 5. Then below we are going to have the previous state for the second hidden layer it's sent down to the joint table which is joined together with the state of the first hidden layer. We have a concatenation linear which goes from 10 to 5 because we have 2 times D. Then we apply the nonlinear function and then we have last linear mapping followed by the last nonlinearity. Let's go back here and let's see instead what is time net this time. So let's go and type graph dot dot time net forward graph and we call it time net and time net. Then we can open it and here we have the monster. Let's try to understand what's going on. Let's scroll up. Okay we see that in blue we have RNN module 1, RNN module 2, RNN module 3 and RNN module 4. We have replicated our very same model 4 times in order to catch events that are at most 4 apart. Then the first RNN module is sent with the input number 1. The second one with the input number 2, the third one with the input number 3 and the fourth with the last fourth input. Moreover, the previous states of layer 1, node 13 and 14 are sent into the RNN module number 1. Then the state from the model number 1 is sent forward to the model number 2. Which state is sent forward to number 3 and which state is sent forward to model number 4. At the end we have the last two states which are sent out to the network. Moreover, to the output of the network we have the prediction from the RNN module 1, the prediction from the model RNN module 2, the prediction from the RNN module number 3 and the final prediction from the model RNN module 4. In this way we can easily train our system by providing the initial state and the sequence of input. And then we are going to get in output a sequence of prediction and the final state of the system. Then we will simply compute the loss function with respect to the prediction and the end labels. And then back propagate the gradient of the loss function with respect to each and every prediction back into the networks. And this is handled automatically by the RNN graph package. In the next video we will go through the complete training script to have a working example on how such a system can be easily trained.