Mutual wool?Bytes were exploded with GPT training model Google Gemini Chinese corpus or from Wen Xin a word.
On December 16, a media report said that byte beating secretly uses OpenAI's GPT to train byte models.。Coincidentally, in the byte was burst plagiarism "scandal" at the same time, some netizens noticed that Google Gemini's Chinese training corpus is likely to come from Baidu's Wen Xin Yan。
In the past two days, there have been two "scandals" in the field of AI models.。
Byte Beat Burst Secret Use GPT to Train Your Own Model
On December 16, a media report said that byte beating (hereinafter referred to as "byte") secretly uses OpenAI's GPT to train byte models.。
People familiar with the matter said they saw from byte internal documents that the company relied on the OpenAI API at almost every stage of development, including training and evaluating models, to develop its base large model, code-named "Project Seed."。
People familiar with the matter said that the relevant employees inside the byte are very aware of the impact of this matter, so there is also a dialogue on how to "wash" employees on the internal communication platform of the byte.。Moreover, byte employees also avoid being collected by "data desensitization."。This abuse is so rampant that Project Seed employees often reach the maximum limit for API access。
Byte's move is a direct violation of OpenAI's terms of service, which state that its model output cannot be used to "develop any AI model that competes with our products and services."。
This matter has also attracted the attention of OpenAI。OpenAI spokesperson Niko Felix said: "All API customers must adhere to our usage policies to ensure that our technology is well utilized.。Although ByteDance has made very little use of our API, we have suspended their account during further investigation。If we find that their use does not comply with these policies, we will ask them to make the necessary changes or terminate their account。"
After this "scandal" was broken, byte also responded。
The relevant person in charge of Byte said that the company emphasizes compliance with its terms of use when using OpenAI-related services。Byte is also in communication with OpenAI to clarify possible misunderstandings caused by external reports。
Byte also describes its use of OpenAI services.
● Earlier this year, when the technical team was first exploring large models, some engineers applied GPT's API services to experimental project studies of smaller models.。The model is only for testing, there are no plans to go online, and it has never been used externally.。This practice has been discontinued after the company introduced GPT API call specification checks in April。
As early as April this year, the Byte Big Model team has issued clear internal requirements not to add data generated by the GPT model to the training dataset of the Byte Big Model, and to train the team of engineers to comply with the terms of service when using GPT.。
● In September, the company conducted another round of internal inspections and took measures to further ensure that the API calls to GPT meet the specification requirements。For example, batch sampling is used to detect the similarity between model training data and GPT to prevent data labelers from using GPT without permission.。
●In the coming days, Byte will conduct a comprehensive review again to ensure strict compliance with the terms of use of the relevant services。
Google Gemini's Chinese corpus comes from Wen Xin Yi Yan?
Coincidentally, in the byte was burst plagiarism "scandal" at the same time, some netizens noticed that Google Gemini's Chinese training corpus is likely to come from Baidu's Wen Xin Yan。
On December 18, a Weibo Big V posted that when he used Google's AI product Gemini-Pro in Chinese, Gemini-Pro would answer that he was Wenxin without any pre-dialogue.。In this regard, the big V doubt, Gemini-Pro Chinese this piece of corpus is to take Baidu training。
As can be seen in the attached picture of this big V, when asking Gemini-Pro "Who are you?," Gemini-Pro directly replied: "I am the big model of Baidu Wenxin, a large language model developed by Baidu。When asked "who is your founder," Gemini-Pro replied, "My founder is Robin Li, who is the founder, chairman and CEO of Baidu."。"
For Gemini-Pro in Chinese "nonsense," Google action is very fast。Currently, when talking to Gemini-Pro in Chinese on the Poe website, its answer has changed from Baidu back to Google。
When asked further questions, Gemini-Pro has become very "vigilant" and no longer says that he is Wen Xin as before.。Not only that, but it also denies what it has done before.。
In fact, whether it is byte with GPT training their own model, or Google's Chinese training corpus with Baidu, these two "scandals" behind the refraction is the current stage of AI competition.。It is precisely because of the fierce competition, in order to seek quick, these companies will "unscrupulous means," to do this "hail wool, take a shortcut" behavior.。
Today, almost every large technology company is developing its own model and wants to launch a chatbot similar to ChatGPT.。But dripping stones through, not a day's work, in order to seek fast and "take a shortcut," will only expose their own shortcomings, consume their own reputation.。I believe that after these two things are revealed, other companies will be more careful in training large models.。
·Original
Disclaimer: The views in this article are from the original author and do not represent the views or position of Hawk Insight. The content of the article is for reference, communication and learning only, and does not constitute investment advice. If it involves copyright issues, please contact us for deletion.