 So, we can now further train BERT on very specific NLP tasks. All we need to do is replace the fully connected output layers of the network with a fresh set of output layers that can basically output the answer to the question we want. Then we can perform supervised training using a question answering data set. It won't take long since it's only the output parameters that are learned from scratch. Most of the model parameters are just slightly fine-tuned. And as a result, training time is fast.