Authors Seek Meta’s Torrent Client Logs and Seeding Data in AI Piracy Probe

meta logoOver the past two years, AI development has progressed at a rapid pace.

This includes large language models, which are typically trained on broad datasets of texts; the more, the better.

When AI hit the mainstream, it became apparent that many rightsholders had concerns over the unauthorized use of their copyright works. Creatives including photographers, artists, musicians, journalists, and authors, responded by filed copyright infringement lawsuits to protect their rights.

Book authors, in particular, complained about the use of pirated books as training material. In various lawsuits, companies including OpenAI, Microsoft, Meta, and NVIDIA are accused of obtaining books from ‘pirate’ sources, including the controversial Books3 database and shadow library LibGen.

Meta Acknowledges ‘Pirate’ Sourcing Early On

One of the most intriguing cases, especially for those interested in the piracy angle, is the class action lawsuit filed by authors including Richard Kadrey, Sarah Silverman, and Christopher Golden. The authors accuse Meta of using their work without permission.

While this may sound problematic to some, Mark Zuckerberg’s Meta didn’t beat around the bush. More than a year ago the company admitted that unofficial sources, comprised of pirated content, were used as training input.

Crucially, however, Meta denied the copyright infringement allegations, noting that it would rely on a fair use defense, at least in part.

“To the extent that Meta made any unauthorized copies of any Plaintiffs’ registered copyrighted works, such copies constitute fair use under 17 U.S.C. § 107,” Meta said in its early response.

A Spotlight on Meta’s Torrenting Activity

The fair use defense will be central in many AI copyright infringement lawsuits. AI companies generally believe that use of ‘public’ data as training inputs is justified. They characterize the use as transformative and argue that it doesn’t compete with the original market for these works.

Whether that is indeed the case is a question that may ultimately end up at the Supreme Court. Meanwhile, however, rightsholders in this lawsuit have raised additional allegations of copyright infringement.

A few weeks ago, the plaintiffs asked for permission to submit a third amended complaint. After uncovering Meta’s use of BitTorrent to source copyright-infringing training data from pirate shadow library, LibGen, the request was justified, they argued.

libgentorrents

Specifically, the authors say that Meta willingly used BitTorrent to download pirated books from LibGen, knowing that was legally problematic. As a result, Meta allegedly shared copies of these books with other people, as is common with the use of BitTorrent.

“By downloading through the bit torrent protocol, Meta knew it was facilitating further copyright infringement by acting as a distribution point for other users of pirated books,” the amended complaint notes.

“Put another way, by opting to use a bit torrent system to download LibGen’s voluminous collection of pirated books, Meta ‘seeded’ pirated books to other users worldwide.”

Seeded

libgen torrent

Court Greenlights Torrent Piracy Probe

Meta believed that the allegations weren’t sufficiently new to warrant an update to the complaint. The company argued that it was already a well-known fact that it used books from these third-party sources, including LibGen.

However, the authors maintained that the ‘torrent’ angle is novel and important enough to warrant an update. Last week, United States District Judge Vince Chhabria agreed, allowing the introduction of these new allegations.

In addition to greenlighting the amended complaint, the Judge also allowed the authors to conduct further testimony on the “seeding” angle.

“[E]vidence about seeding is relevant to the existing claim because it is potentially relevant to the plaintiffs’ assertion of willful infringement or to Meta’s fair use defense,” Judge Chhabria wrote last week.

Authors Want Meta’s Torrent Client Logs and Seeding Data

With the court recognizing the relevance of Meta’s torrenting activity, the plaintiffs requested reconsideration of an earlier order, where discovery on BitTorrent-related matters was denied.

Through a filing submitted last Wednesday, the plaintiffs hope to compel Meta to produce its BitTorrent logs and settings, including peer lists and seeding data.

“The Order denied Plaintiffs’ motion to compel production of torrenting data, including Meta’s BitTorrent client, application logs, and peer lists. This data will evidence how much content Meta torrented from shadow libraries and how much it seeded to third parties as a host of this stolen IP,” they write.

While archiving lists of seeders is not a typical feature for a torrent client, the authors are requesting Meta to disclose any relevant data.

In addition, they also want the court to reconsider its ruling regarding the crime-fraud exception. That’s important, they suggest, as Meta’s legal counsel was allegedly involved in matters related to torrenting.

“Meta, with the involvement of in-house counsel, decided to obtain copyrighted works without permission from online databases of copyrighted works that ‘we know to be pirated, such as LibGen’, they write.

Modified Settings

settings

The authors allege that this involved “seeding” files and that Meta attempted to “conceal its actions” by limiting the amount of data shared with the public. One Meta employee also asked for guidance, as “torrenting from a corporate laptop doesn’t feel right.”

Meta as Distributor

With the addition of a torrent angle, the amended complaint adds a new element to the case. One that could potentially be crucial, particularly for the fair use defense.

The plaintiffs now accuse Meta of operating as a distributor of the pirated works. While that has little to do with how the works were used to train AI, it’s a copyright claim, nonetheless, and one that might be harder to defend as fair use.

Whether this will substantially change the case has yet to be seen, but it’s certainly fuel for legal fireworks. That said, these torrent allegations are just a small fraction of the case, which will be fought tooth and nail by both sides.

A copy of the plaintiffs’motion for relief from the non-dispositive pretrial order submitted on January 15, is available here (pdf). A copy of the third-amended complaint can be found here (pdf)

From: TF, for the latest news on copyright battles, piracy and more.

Powered by WPeMatico

Author: oxy

Crypto Cabaret's resident attorney. Prior to being tried and convicted of multiple felonies, Oxy was a professional male model with a penchant for anonymous networks, small firearms and Burberry polos.

Share This Post On