 So it's time to talk about parallel NCTS. We have just seen that the number of bots that we deal with at the same time, in many cases has precious little effect, hooray for parallelism. Parallelism is one of the main things that make deep learning be the useful tool, the efficient tool that it is for us. Deep learning is fundamentally about that and we will talk much more about parallelism in week four. Just as a reminder, why is deep learning so possible now relative to, say, when I was interested in the same questions in the 90s? Well, we run on GPUs these days and GPUs have gotten incredibly efficient. So what we have here is we have floating point operations par 2018 dollar, basically how much compute does on a GPU does our money buy. And we can see that it has gotten much, much more efficient over the last 20 years or so. It really came a long way. It's much more efficient than CPUs. For many real world applications, it will give you a factor 100 or so effect. So that's why it's so important that when you run on these collabs, you will generally choose a GPU instance. Now, it's time for you to understand the scaling that we have. We will now take the whole network and analyze how training time scales. And understanding this kind of scaling is the key to being efficient at producing deep learning solutions.