 So, what we find is if we keep the number of neurons the same, then they exist in ideal depth. Now, there's lots of reasons for this difference, no? It could be that it affects expressivity. It could be that it affects what we can learn concretely for our implementation. I could require, relate to a more general notion of vulnerability. But the reasons for these graphs we don't currently know and it's very hard to know. So, last week we saw that there's many cases where the dynamics of linear learning can be stood analytically, which is beautiful because it gives us some interactions. What if we could understand the dynamics in multi-layer perceptrons? Well, first, that seems impossible, because look, it's this highly complex deep learning system, but let's see if it might not be possible. So, the first thing is, let's just see how linear training is. Imagine you have a very large neural networks. Do you expect the weights to change more or less as I make the network bigger and bigger? Well, maybe a bigger network needs less changes, but that's just an intuition for the moment. Let me just remind you of how big networks are getting. Here's some text processing models where you see they're getting into the billion parameter range. So, they're getting really quite big. So, I don't know, like how much do we expect learning to really change the weights? Well, there's only one good way to find out. Take a big-ish network, train it, plot how the weights change and ask yourself, does it make smaller, big changes and is learning linear or non-linear in this case? We can just empirically have a look at it.