 Chat GPT explained in 60 seconds, we have a GPT model that we want to fine tune with user inputs as well as responses. This will teach the GPT model to listen to user prompts. Now we have a set of prompts and for each prompt will pass it to the model in order to generate a few responses. We ask our labelers to rank these responses by assigning actual scores to them, depending on their quality. And we use all of this to create a rewards model. The rewards model will take a prompt and one response and tell us how high quality that response is. We then now take an unseen prompt, pass it through the original supervised fine tune model to generate a result. Then we use the rewards model to tell us how good this result was. And we use this rewards model in the original model's loss function to fine tune it further. And so on fine tuning the supervised fine tune model, we get Chat GPT.