 How many models just chat GPT have or how many does it use well? There's three it has a supervised fine-tuned model to understand and respond to user inputs It has a rewards model to help it assess whether a responses of good quality or not good quality And then the core if it is this proximal policy optimization model Which is based on the first model and it uses the second model in order to fine-tune itself to become better and better at Answering questions in a factual and safe way