 The data itself is pretty interesting to dig into. It's a set of 28 by 28 pixel grayscale images. So you can picture the data representation there is a two-dimensional array, 28 rows, 28 columns, and only one in the third dimension, one panel or one channel. It was pulled from a larger data set that was compiled by the National Institute of Standards and Technology, NIST. And then it was also normalized for size and centered in the image. So it's good to keep in mind as we go into this that there's been a fair amount of legwork done to get the data into its nice clean current state. That's part of what makes MNIST so great to work with as a starter data set. And that's part of also what makes it a little bit less impressive if you have a model that can do well on MNIST digits. That doesn't necessarily mean it'll do well on data in general. This is very nicely cleaned up data. The rest of this description gives some small but fascinating technical details about how the data was processed and considered in making this data set. It's worth a careful read if only to get a sense for what type of work you have to do to get a data set ready for nice, convenient processing in a neural network. Another fun feature of this page is you can see as of this writing at least what various approaches have been able to achieve on this, what error rates. And for future reference, we will be working here in a convolutional net, a two-layer convolutional neural network. So comparing to some of these that have around 1% error, so one out of every 100 examples that gets the wrong label for, that's what we can compare ourselves against.