Nvidia reportedly sanctioned video scraping to train AI model according to leaked docs

According to a recent report from 404 Media’s Samantha Cole, leaked internal communications from Nvidia show the continuation of what appears to be the industry trend of big tech companies taking the ‘ask for forgiveness instead of permission’ approach regarding the data they use to train AI models.

Even when employees raised legal and ethical concerns, managers reportedly told them that the company’s practice of scraping millions of hours of videos from YouTube, Netflix, and other data sets was “an executive decision” in one instance and called “an open legal issue” in another.

If you were still on the fence regarding the ongoing debate about the legal and ethical aspects of where AI companies get their training data, this might be enough to make you pick a side.

Won’t somebody please think of the creators?

cool cool cool cool cool cool now leaked NVIDIA slack messages discussing which YouTube channels to scrape videos from. MKBHD videos? Yeah grab those too. https://t.co/0XczvTNVBH

— Marques Brownlee (@MKBHD) August 5, 2024

Nvidia has opted to stick to its guns regarding its unscrupulous scraping, as Cole writes, “When asked about legal and ethical aspects of using copyrighted content to train an AI model, Nvidia defended its practice as being “in full compliance with the letter and the spirit of copyright law.”

Well, the leaked Slack conversations and emails from the team working on a project codenamed ‘Cosmos’ tell a different story.

As does YouTube’s CEO Neal Mohan who said in April that using YouTube to train AI models is a “clear violation” of the platform’s terms. Back then, he was responding to reports that OpenAI used YouTube videos to train its Sora text-to-video generator.

Just last month, AI startup Runway came under the same fire in another 404 Media report that it used YouTube videos and other pirated content as training data without proper permission. Can you see the pattern yet?