OpenAI recently announced a new model called CriticGPT based on GPT4. Unlike the other models from the company, which are consumer-facing, CriticGPT is designed to “write critiques of ChatGPT responses to help human trainers spot mistakes during reinforcement learning from human feedback (RLHF).”
CriticGPT is based on GPT-4 and will help the human trainers at OpenAI to “catch errors in ChatGPT’s code output.” According to OpenAI, code reviewed by CriticGPT can outperform unreviewed code by 60 per cent. The company is currently integrating CriticGPT-like models into the RLHF labelling pipeline to assist AI trainers in evaluating outputs from advanced AI systems.
We’ve trained a model, CriticGPT, to catch bugs in GPT-4’s code. We’re starting to integrate such models into our RLHF alignment pipeline to help humans supervise AI on difficult tasks: https://t.co/5oQYfrpVBu
— OpenAI (@OpenAI) June 27, 2024
OpenAI says models like CriticGPT can help make ChatGPT more accurate with subtle mistakes and can also spot errors that humans might miss, as models become more knowledgeable.
The process of training CriticGPT included editing ChatGPT-generated code manually and introducing new errors into the code along with sample feedback to train the model to easily identify common and not-so-common mistakes.
Again, just like human suggestions, CriticGPT’s suggestions are not always correct, however, the combination of the Human+CriticGPT team is said to outperform unassisted human trainers, and also helps trainers write “comprehensive critiques” while producing fewer hallucinations.
OpenAI also says CriticGPT might spread real-world mistakes across many parts of the answer, and it cannot evaluate an extremely complex task or response. This new AI model, according to the company, will help human trainers to “produce better RLHF data for GPT-4,” and OpenAI is also planning to scale this work further.
[ad_2]
Source link







Leave a Reply