close
close

Meta CEO Mark Zuckerberg Allegedly Allowed Llama AI Models to Train on Copyrighted Material

Meta CEO Mark Zuckerberg Allegedly Allowed Llama AI Models to Train on Copyrighted Material

Meta is facing a copyright lawsuit over its alleged use of copyrighted works to train artificial intelligence (AI) models. The lawsuit was filed by multiple plaintiffs, including several best-selling authors. The main accusation against the tech giant is that it used pirated e-books and articles to train older versions of its Llama AI models, thereby violating copyright. Additionally, the documents accuse the company’s CEO Mark Zuckerberg of allowing the Llama AI team to torrent a suspected link aggregator to access copyrighted material.

Information comes from two separate documents filed a complaint Wednesday in the U.S. District Court for the Northern District of California. Documents from complainants such as authors Sarah Silverman and Ta-Nehisi Coates highlight Meta’s testimony in late 2024, which discovered that Zuckerberg had allowed a dataset called LibGen to be used to train Lama’s artificial intelligence models.

It is worth noting that LibGen (short for Library Genesis) is a file sharing platform that offers free access to academic and general content. Many consider it a pirate library because it provides access to copyrighted works that would otherwise be behind a paywall or not digitized at all. The platform has been the subject of several lawsuits and has been ordered shut down in the past.

The documents show that Meta used the LibGen dataset with full knowledge that it contained pirated content and violated copyright laws. The document also cited a memo to Meta’s AI decision-makers, which mentioned that after “escalation to the Ministry of Health,” Meta’s AI team “has been approved to use LibGen.” In this case, MZ is an abbreviation of the name of the CEO of Meta.

Additionally, the memo also mentioned that management had been alarmed by the fact that public knowledge of the use of “a dataset we know to be pirated, such as LibGen,” could undermine its negotiating position with regulators. The social media giant was also accused of removing copyright notices from the dataset’s text and metadata to conceal the infringement.

The documents show that Nikolay Bashlykov, a research engineer working in Meta’s artificial intelligence division, allegedly removed copyright notices from the LibGen dataset. To further conceal evidence of use of the purported dataset, “Meta developers included ‘supervised samples’ of data when tuning Lamy to ensure that Lamy’s results would provide less biased answers when answering questions about the source of Meta’s training data,” the document stated .

Furthermore, the complainants also alleged that Meta had committed another type of copyright infringement simply by accessing LibGen. The documents claimed that the tech giant had torrented the LibGen dataset. The process of using Torrent involves both downloading and uploading (also known as uploading) content. The uploading process could be considered dissemination of copyrighted material and constitute infringement, the filings claim.

“If Meta had purchased Plaintiffs’ works in a bookstore or borrowed them from a library and trained its llama models on them without a license, it would have been guilty of copyright infringement. “Met’s decision to bypass legal methods of purchasing books and become a willful participant in an illegal torrent network constitutes a violation of CDAFA (California Comprehensive Computer Data Access and Fraud Act) and serves as evidence of copyright infringement,” the files read.

A copyright lawsuit is currently underway and is awaiting a ruling. Meta has not yet presented its arguments, which will likely be based on fair use. The court will have to decide whether the generative capabilities of the AI ​​model can be considered transformative enough to support this argument or not.

Watch the latest news from the Consumer Electronics Show on Gadgets 360 on our website CES 2025 center.