 Then we initialize the structure. This is the cottonwood model class. This is our blank chalkboard that we're going to start putting these different blocks and assembling them as layers in a neural network. So initially this classifier is empty. It's blank. The next step is to add in the blocks one at a time. The way we do that is with classifiers add method and we give each one a name. That way if we're adding in five different convolution blocks and a bunch of different activation blocks, we can refer to one in particular if we want to reach in and pull out a value or measure a distribution of what's happening there. So we give each one a name as we create it. So we add the training data block, which we call, very not creatively, training data. We add a convolution block, which we initialize with our kernel size, our regularization parameters, the number of kernels, and the optimizer that we've chosen, and we name it Convolution2D0. We add a bias block, a hyperbolic tangent block, which we call Act for activation function. Then we repeat this. We add another convolution block, initialize the same way, but we give it a different name. We add another bias block, another hyperbolic tangent block. So now that we've gone through two convolution layers, we do pooling in two dimensions and we specify a stride of two with a window size of three. By trial and error, this is a combination that seems to work well. So we're rolling with it for now. We can always change those later and see if it helps or hurt us. Then we use a flatten block to go from our two-dimensional array, actually at this point, three-dimensional array because we have a number of channels, to a single linear one-dimensional flat set of inputs. Then we pass those to a linear layer, which has its own L1 and L2 regularization parameters, the number of classes, so the number of output nodes, and an optimizer. And we give it a name, linear two. We give it its accompanying bias and logistic activation function. And then we add a copy block. This lets us make two copies of that output, of that logistic function, and send one to our loss function and then the other to our hardmax function. So one of these, then we also create our loss function, which is a hinge loss, which we'll describe in detail later, and the one-hot block, and then our hardmax block that lets us turn our zero-to-one-valued predictions from our logistic output into a solid categorical prediction.