Neal Mohan, the CEO of YouTube, owned by Alphabet Inc. (NASDAQ: GOOG) (NASDAQ: GOOGL), suggested that if ChatGPT-parent OpenAI used YouTube videos to train its text-to-video AI model, Sora, then it would be a "clear violation" of the platform's policies.
What Happened: Last month, in an interview, OpenAI CTO Mira Murati said she isn't confident about whether YouTube videos were used to train their AI model. "If they were publicly available to use, there might be data [used]. But I'm not sure, I'm not confident about it," she said then.
In an interview with Bloomberg on Thursday, Mohan was asked if YouTube was used to train Sora. He said, "I don't know," adding that the Sam Altman-led company would be a better candidate to answer that question.
When the interviewer asked if YouTube videos were being used, Mohan immediately said that would be against their policy. "We have a clear terms of service. From a Creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service are going to be abided by."
He explained that YouTube's terms of service allow some content, like the title of a video or the channel's name, to be scrapped since that is necessary to "enable the open web" for that content to show up in other search engines. However, they do not allow video transcripts or bits to be downloaded. "That is a clear violation of our toss."
The interviewer then asked if Google is using YouTube to train Gemini, to which Mohan replied by saying, "Google uses YouTube content really in accordance. Again back with those terms of service or individual contracts that we might have with creators or uploaders to our platform."
He went on to explain that they have different licensing contracts with many creators, and "some portion of that YouTube Corpus" may get used for such models internally. However, Mohan reiterated that "it's going to be in concert with whatever the terms of service or the contract that creator has signed before uploading their content to YouTube."
Why It Matters: OpenAI's Sora has been controversial since its launch. The AI model, which can generate high-quality videos from textual prompts, has been accused of violating data protection laws by using public social media posts to train its model. This has led to a public spat between OpenAI and Elon Musk, who has accused the company of stealing "everything."
This wasn't the first time Musk and OpenAI clashed over data. The billionaire tech mogul previously terminated OpenAI's access to Twitter's data shortly after acquiring the social media platform. According to the reports, he believed the $2 million OpenAI paid annually for the data license was inadequate.
While Murati did not confirm if publicly available data from YouTube, Instagram, and Facebook had been used to train Sora, she verified that the licensed data included data from Shutterstock.
Meanwhile, the Indian government has adopted a firm position regarding the usage of AI data, with the nation's IT Minister affirming that only reliable AI models will be granted access to its data. This action highlights the worldwide apprehension regarding the ethical deployment of AI and the imperative for well-defined regulations.