 Now, this has a couple of interesting implications. The first one is, if you give me the weights, I can calculate what the polytopes are. The second one is, the inverse is also true. If you tell me where the polytopes are, I can tell you exactly what the value of each weight is, minus invariances of course. And it's possible to estimate the complexity where there's a nice theorem by Hannan and Rolnick that shows that the expected number of intersections along a one-dimensional trajectory is linear in the number of neurons n. So it gives us an idea about the complexity of these functions that can be implemented by hierarchical real-life functions. Now, let's do an aside. So what we did so far is we looked at real use. We know that they can approximate anything. All the functions at least are meaningful for us. Could we just have used polynomial features? One way of building nonlinear functions have a hierarchy where they can become nonlinear. An alternative would be that I have always just a shallow network, just one hidden layer, not even, and just use polynomial features and just have enough of them. It allows us to do the same thing. So why would that not be a good idea? We all know the neural networks are good, but why are neural networks good? Why is it not just the same? We could just like now give me x1, x2 as inputs and then we can x1 square times x2, x1 to the power of 7, x2 to the 5. Why are there worse features? Well, let's see how well that works. Why don't you go try polynomial features and figure out if they work well and if they don't work well, which I'm telling you, why do they not work well? Keep in mind that we're in this low dimensional space where we can meaningfully visualize the results.