HawkInsight

  • Contact Us
  • App
  • English

OpenAI Launches Advanced Voice Mode for Select Plus Users

According to OpenAI, it has introduced an Advanced Voice Mode that provides more natural real-time conversations, allows users to interrupt at any time, and senses and responds to the user's emotions.

On July 31, OpenAI said on its official media account that it began rolling out Advanced Voice Mode to a small group of ChatGPT Plus users.

Currently, real-time response and interruptible conversations are recognized voice assistant technology difficulties. And according to OpenAI, it has introduced an advanced voice model that provides a more natural real-time dialog, allows the user to interrupt at any time, and senses and responds to the user's emotions.

Previously, ChatGPT's voice dialog used three separate models: one for converting the user's voice to text, GPT-4 to understand and process the text's prompts, and then a third model to convert the text to speech. However, GPT-4o is a single multimodal model that is able to handle these tasks without the help of auxiliary models, thus significantly reducing the latency of the dialog.

OpenAI also claims that GPT-4o can sense emotional tones in a user's voice, including sadness, excitement or singing.

OpenAI said that while only a small group of users will be able to participate in the advanced voice mode test at first, the company plans to gradually expand the test and plans to make it available to all ChatGPT Plus users in the fall.

ChatGPT has been supporting voice conversations since last September, and showed off a more advanced version in May this year. But the version shown in May had sparked a copyright fiasco.

In May this year, Scarlett Johansson, a famous American actress, said that she was shocked and outraged by the fact that one of ChatGPT's voice modes, named Sky, had a voice that was surprisingly similar to hers.

And, according to Scarlett, she was contacted by OpenAI CEO Sam Altman last year to voice ChatGPT, but she declined for personal reasons.

After Scarlett's legal representatives contacted OpenAI, OpenAI said, "Out of respect for Ms. Johnson, we have suspended the use of Sky's voice in our products."

Perhaps fearing a repeat of a similar copyright fiasco, OpenAI added underneath the post announcing the launch of the advanced voice model, "We tested the voice capabilities of GPT-4o with over 100 external red teamers across 45 languages. To protect privacy, we trained the model to speak only in the four preset voices."

The advanced voice mode will be limited to ChatGPT's four preset voices - Juniper, Breeze, Cove and Ember - which were created in collaboration with paid voice actors.

OpenAI also said it has built systems to prevent ChatGPT from outputting voices other than the four mentioned above, and implemented safeguards to block requests for violent or copyrighted content.

·Original

Disclaimer: The views in this article are from the original author and do not represent the views or position of Hawk Insight. The content of the article is for reference, communication and learning only, and does not constitute investment advice. If it involves copyright issues, please contact us for deletion.