HawkInsight

  • Contact Us
  • App
  • English

OpenAI Launches New AI Model With Human Doctorate-Level Reasoning

On September 12, OpenAI announced that it will be launching a new series of inference models for solving difficult problems, and the first model in the series, called OpenAI o1, is available in its preview version starting immediately.

On September 12, OpenAI announced that it will be launching a new series of inference models for solving difficult problems, and the first model in the series, called OpenAI o1, is available in its preview version starting immediately.

OpenAI said the o1 models can reason about complex tasks and solve harder problems than previous scientific, coding, and mathematical models, compared to previously introduced models.

“These models spend more time thinking about the problem before reacting, just like humans do.” OpenAI describes, “Through training, they learn to refine their thought processes, try different strategies, and recognize their mistakes.”

How powerful is the o1 model?

How great is the o1 model, which got an 89% percentile score in Codeforces, a programming proficiency competition, and 83.3% accuracy in the 2024 American Invitational Mathematics Examinatio (AIME) qualifier, placing it among the top 500 students in the US in the top 500 students in the United States.

In comparison, the GPT-4o scored in the 11th percentile at Codeforces and was only 13.4% accurate at AIME.

Additionally, the o1 model performed phenomenally well on a graduate-level Google Quiz benchmark test, GPQA (Grade School Physics Question Answering). gPQA is a challenging dataset containing hundreds of biology, Physics and Chemistry written by experts in the fields.

Typically, experts who have or are pursuing a Ph.D. in the respective field have an accuracy rate of 70% or less on the GPQA, while the o1 model achieved an accuracy rate of 78% on this test, i.e., the o1 model has reached the level of a human Ph.D. student.

In addition to these difficult tests mentioned above, the o1 model also outperforms the GPT-4o in some extensive benchmark tests. For example, the o1 model outperforms the GPT-4o in 54 out of 57 subcategories of the Large Scale Multi-Task Language Understanding (MMLU) test.

Thanks to the o1 model's more powerful reasoning, its answers were also better on questions with stronger reasoning, such as coding and math.

OpenAI says that in a human preference assessment, for the anonymous answers provided by o1-preview and GPT-4o, human trainers preferred the o1-preview's answers, especially in the more reasoning-intensive categories of data analysis, coding, and math, where the o1-preview was much more popular than the GPT-4o. o1-preview did, however, on some natural language tasks performed slightly less well, suggesting that the model is not suitable for all use cases.

Stronger and more expensive

As a new inference model designed for complex tasks requiring extensive general knowledge, the o1 model is much more expensive than the regular model.

According to the OpenAI website, the o1-preview model is priced at $15 per million input tokens and $60 per million output tokens, which are three and four times more expensive than GPT-4o, respectively.

OpenAI mentioned that the o1 model would be more suitable for users who are solving complex problems in science, coding, math and similar fields. For example, medical researchers can use the o1 model to annotate cell sequencing data, physicists can use the o1 model to generate the complex mathematical formulas needed for quantum optics, etc. However, OpenAI has also been kind enough to provide a cheaper version, o1-mini.

The o1-mini is a faster, lower-cost inference model designed for use cases involving coding, math, and science. As a smaller model, o1-mini is 80% cheaper than the o1-preview model.

Both the o1-preview and the o1-mini have 128K context windows with knowledge up to October 2023.

ChatGPT Plus and Team users will be able to start accessing the o1-preview and o1-mini starting September 12. OpenAI said the company plans to follow up by providing o1-mini access to all free users of ChatGPT, but the exact date has not yet been determined.

Although the current o1 model's reasoning ability is relatively slow and expensive to use, cracking the reasoning is an important step toward human intelligence for AI researchers. They believe that if a model's capabilities are not limited to pattern recognition, it could lead to breakthroughs in fields such as medicine and engineering.

“We have been spending many months working on reasoning because we think this is actually the critical breakthrough.”Bob McGrew,OpenAI's chief research officer, said in an interview, “Fundamentally, this is a new modality for models in order to be able to solve the really hard problems that it takes in order to progress towards human-like levels of intelligence.”

·Original

Disclaimer: The views in this article are from the original author and do not represent the views or position of Hawk Insight. The content of the article is for reference, communication and learning only, and does not constitute investment advice. If it involves copyright issues, please contact us for deletion.