OpenAI releases CoT monitoring to stop malicious behavior in large models
Internet reports that OpenAI has released the latest research. Using CoT (Thought Chain) monitoring, it can prevent malicious behaviors such as nonsense and hiding true intentions by large models. It is also one of the effective tools for supervising super models. OpenAI uses the newly released cutting-edge model o3-mini as the monitored object and the weaker GPT-4o model as the monitor. The test environment is a coding task that requires AI to implement functions in the code base to pass unit tests. The results showed that the CoT monitor performed well in detecting systematic "reward hacker" behavior, with a recall rate of 95%, far exceeding 60% of monitoring behavior alone.
Disclaimer: The views in this article are from the original Creator and do not represent the views or position of Hawk Insight. The content of the article is for reference, communication and learning only, and does not constitute investment advice. If it involves copyright issues, please contact us for deletion.