AI industry horrified to face largest copyright class action ever certified

MicroWave@lemmy.world · edit-2 2 days ago

AI industry horrified to face largest copyright class action ever certified

N0t_Legal_Advice@lemmy.today · 14 hours ago

Oh no…I’m super bummed. 🤭

FauxLiving@lemmy.world · edit-2 18 hours ago

I’ve read this entire thread and could not find a single person who seems to have actually read anything about this case.

The article is a huge pile of bullshit.

Here is what happened: An industry group filed an amicus brief during the appeal of a ruling where the judge certified the 3 plaintiffs as a class. Boring legal minutae in a case that doesn’t matter, see below.

The author is either incompetent at understanding legal filings or deliberately being misleading to write clickbait trash. Human slop, if you prefer.

This is not noteworthy, at all. The issue being argued about is if the 3 people can represent the class of “everyone Anthropic downloaded books from”. This is a non-story, unless you’re a legal nerd and care about exactly how courts define classes and the legal steps required for the analysis.

But, more importantly for the frothing anti-AI masses:

In the order certifying the defendants as a class, the judge dismissed the plaintiff’s claims of copyright violation related to the training of LLMs. The judge said that training LLMs was transformative and thus fair use under copyright law and since this is so obvious that that argument could be summarily dismissed.

Don’t believe me, go click on the links in the article to the summary judgement yourself. The information is not hard to find if you read farther than the headline.

The only remaining issue in the lawsuit is if Anthropic is civilly liable for downloading the books on bittorrent.

This case isn’t even about AI anymore, it’s the same kind of lawsuit that we’ve seen since Napster was popular. Uploading copyrighted material, like when you use BitTorrent, is a copyright violation and you could be sued.

That’s all this case is now, the argument that everyone is fighting over in the comments: “Is training an LLM on copyrighted material a violation of copyright?” is already answered by the judge:

No, using copyrighted material to train a LLM is so obviously fair use that the argument was summarily dismissed.

Here’s the relevant quote from the judge, in summary judgement:

To summarize the analysis that now follows, the use of the books at issue to train Claude and its precursors was exceedingly transformative and was a fair use under Section 107 of the Copyright Act. The digitization of the books purchased in print form by Anthropic was also a fair use, but not for the same reason as applies to the training copies. Instead, it was a fair use because all Anthropic did was replace the print copies it had purchased for its central library with more convenient, space-saving, and searchable digital copies without adding new copies, creating new works, or redistributing existing copies. However, Anthropic had no entitlement to use pirated copies for its central library, and creating a permanent, general-purpose library was not itself a fair use excusing Anthropic’s piracy.

cheese_greater@lemmy.world · 19 hours ago

🍿

caboose2006@lemmy.world · 1 day ago

“copyright class action could ruin AI industry”

Oh nooooooo… How do I sign on to this lawsuit?

Ensign_Crab@lemmy.world · 1 day ago

They have the entire public domain at their disposal.

If giant megacorporations didn’t want their chatbots talking like the 1920s, they shouldn’t have spent the past century robbing society of a robust public domain.

N0t_5ure@lemmy.world · 2 days ago

“If we have to pay for the intellectual property that we steal and repackage, our whole business model will be destroyed!”

errer@lemmy.world · 2 days ago

One thing this whole AI training debacle has done for me: made me completely guilt-free in pirating things. Copyright law has been bullshit since Disney stuck their finger in it and if megacorps can get away with massively violating it, I’m not going to give a shit about violating it myself.

bss03@infosec.pub · 1 day ago

For me it was Disney floating the idea of asking the wrongful death suit be dismissed because of the liability waiver in a Disney+ free trial.

I have the $$$, but I don’t agree with the terms for any of the streaming services, so I’ll just sail the seven seas and toss a doubloon (coin) to independent creators (my witchers) when I can.

Kühlschrank@lemmy.world · 1 day ago

I’m pretty much there too, the whole industry consolidates on the new things and charges us as they make it worse. And there can be some arguments to be made over the benefits of AI but we all know that it will not be immune to the entshitification that has already ruined all the things before it

aramis87@fedia.io · 2 days ago

If I downloaded ten movies to watch with my nephew in the cancer ward, they’d sue me into oblivion. Download tens of millions of books and claiming your business model depends on doesn’t make it okay. And sharing movies with my sick nephew would cause less harm to society and to the environment than AI does.

FauxLiving@lemmy.world · 18 hours ago

“If we have to pay for the intellectual property that we steal and repackage, our whole business model will be destroyed!”

They are very likely to be civilly liable for uploading the books.

That’s largely irrelevant because the judge already ruled that using copyrighted material to train an LLM was fair use.

The judge did so in a summary motion, which means that they have to read all of the evidence in a manner most favorable to the plaintiff and they still decided that there is no way for the plaintiff to succeed in their copyright claim about training LLMs because it was so obviously fair use.

N0t_5ure@lemmy.world · 13 hours ago

Read the Order, which is Exhibit B to Antrhopic’s appellate brief.

Anthropic admitted that they pirated millions of books like Meta did, in order to create a massive central library for training AI that they permanently retained, and now assert that if they are held responsible for this theft of IP it will destroy the entire AI industry. In other words, it appears that this is common practice in the AI industry to avoid the prohibitive cost of paying for the works they copy. Given that Meta, one of the wealthiest companies in the world, did the same exact thing, it reinforces the understanding that piracy to avoid paying for their libraries is a central component of training AI.

While the lower court did rule that training an LLM on copyrighted material was a fair use, it expressly did not rule that derivative works produced are protected by fair use and preserved the issue for further litigation:

Again, Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop.

Emphasis added. In other words, Anthropic can still face liability if it’s trained AI produces knockoff works.

Finally, the Court held

The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained “forever” for “general purpose” even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience. … We will have a trial on the pirated copies used to create Anthropic’s central library and the resulting damages, actual or statutory (including for willfulness). That Anthropic later bought a copy of a book it earlier stole off the internet will not absolve it of liability for the theft but it may affect the extent of statutory damages. Nothing is foreclosed as to any other copies flowing from library copies for uses other than for training LLMs.

Emphasis in original.

So to summarize, Anthropic apparently used the industry standard of piracy to build a massive book library to train it’s LLMs. Plaintiffs did not dispute that training an LLM on a copyrighted work is fair use, but did not have sufficient information to assert that knockoff works were produced by the trained LLMs, and the Court preserved that issue for later litigation if the plaintiffs sought to bring such a claim. Finally, the Court noted that Anthropic built it’s database for training it’s LLMs through massive straight-up piracy. I think my original comment was a fair assessment.

FauxLiving@lemmy.world · 12 hours ago

It looks, to me, like you’re reading the briefing without understanding how the legal system functions. You’re making some incredibly basic mistakes. Copyright violations and theft are two distinct legal concepts, for example. You’re treating the case summary as if it were the legal argument in the brief and you’re misinterpreting some pretty clear legal language written by the judge.

Anthropic admitted that they pirated millions of books like Meta did, in order to create a massive central library for training AI that they permanently retained, and now assert that if they are held responsible for this theft of IP it will destroy the entire AI industry.

No, that is not their argument.

Their legal argument, in the appeal of the class certification, is that the judge did not apply the required analysis in order to certify the three plaintiffs as being part of a class. He instead relied on his intuition, not any discovered facts or evidence. This isn’t allowed when analyzing a case for class certification.

In addition, Anthropic adds, it is well supported in case law (cited in the motion) that copyright claims are a bad fit for class action.

This is because copyright law focuses on individual works and each work has to be examined as to its eligibility for copyright protection, the standing of the plaintiff and if, and how much, of each individual work was the defendant responsible for violating copyright.

This can be done when 3 people claim a copyright violation, because they have a limited set of work which a court can reasonably examine.

A class action would require a court to consider hundreds or thousands of claimants and millions of individual works, each of which can be challenged individually by the defendant.

Courts typically don’t like to take on cases that can require millions of briefings, hearings and rulings. Because of this, courts usually always deny class action certification for copyright violations.

The court, in its order, did not address this or apply any of the required analysis. The class was certified based on vibes, something that doesn’t follow clearly established case law.

Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public.

This is because training an LLM results in a language model.

A language model is in no way similar to a book and so training one is a transformative use of copyrighted material and protected under fair use.

Authors concede that training LLMs did not result in any exact copies nor even infringing knockoffs of their works being provided to the public. If that were not so, this would be a different case. Authors remain free to bring that case in the future should such facts develop.

In other words, Anthropic can still face liability if it’s trained AI produces knockoff works.

No, the judge didn’t make any claim about the model’s output after training. That isn’t an issue that’s being addressed in this case. You’re misunderstanding how judges address issues in writing.

Here, the judge is addressing a very narrow issue, specifically the exact claim made by the plaintiff (training with copyrighted material = copyright violation).

The subject of the paragraph is concerned with training the LLM. The claim by the plaintiff is that using copyrighted works to train LLMs is a violation of copyright. That’s what the judge is addressing.

The judge dismissed this argument because it was transformative and so protected by fair use.

The judge further noted that the plaintiffs did not show that training the LLM resulted in “any exact copies nor even infringing knockoffs of their works being provided to the public” and if they could show that training the LLM resulted in “any exact copies nor even infringing knockoffs of their works being provided to the public” then they could bring a case in the future. This is the judge hinting that they can amend their filings in this case to clarify their argument, if they had any evidence to support their claim.

The judge is telling the plaintiff that in order to succeed in their claim, which is that training an LLM on their work is a violation of their copyright, they need to show that the thing that they’re claiming has to result in copies of infringing material or knockoffs.

The training resulted in a model. Creating a model is transformative (a model and a book are two completely different things) and the plaintiffs didn’t show that any infringing works were produced by the training and therefore they have no way of succeeding with their argument that training the model violated their rights.

You’re reading a lot of extra into that statement that isn’t there. The plaintiffs never made a claim about the output of a trained model and so that argument wasn’t examined by the judge.

ThePantser@sh.itjust.works · edit-2 2 days ago

I started my own streaming service with pirated content. My business model depends on that data on my server.

Same thing but for some reason it’s different. They hate when we use their laws against them. Let’s root they rule against this class action so we can all benefit from copyright being thrown out. Or alternatively it kills AI companies, either way is a win.

Knock_Knock_Lemmy_In@lemmy.world · 20 hours ago

They hate when we use their laws against them

YSK. They, we and them in this sentence mean different things to different people.

skuzz@discuss.tchncs.de · 1 day ago

We’ll get a good taste of just how corrupt the US legal system now is, instead. Copyright law will still apply to we plebs, the Executive branch will overstep its powers, requiring some mafioso payoff from AI companies to keep doing what they do. The case will go away, mysteriously.

prole@lemmy.blahaj.zone · edit-2 17 hours ago

Worse, a precedent will be set for future copyright cases

Bakkoda@sh.itjust.works · edit-2 2 days ago

That’s unfair. They also have to sue people who infringe on “their” IP. You just don’t understand what it’s like to a content creator.

CharlesDarwin@lemmy.world · 1 day ago

Yeah, who the fuck gave all these rich assholes the right to make money on others’ work?

I’d like to know how these assholes get away with even training on GPL licensed code.

iknowitwheniseeit@lemmynsfw.com · 19 hours ago

Making money on other people’s work is literally capitalism.

Capitalists take the surplus of workers, because they own the means of production.

SugarCatDestroyer@lemmy.world · 17 hours ago

Well, they’ve always profited from other people’s labor, and now they think that our souls belong to them too. They’ve gotten completely brazen!

It’s like they took only part of the wheat from the peasants before, and then decided to take it all by force and cunning, down to the last grain lol. :3

TankovayaDiviziya@lemmy.world · 1 day ago

I would not hold my breath. There is a high likelihood that the courts will side AI companies because the American courts are compromised.

gravitas_deficiency@sh.itjust.works · 1 day ago

I’m no fan of the copyright fuckery so commonly employed by (amongst others) the RIAA and MPAA, but this is honestly the best use of copyright law I can think of in recent memory.

Azal@pawb.social · 1 day ago

It’s the neat part with giant monsters… sometimes they trod on each others toes and they stop eating us to tear each other apart and we get to sit back and watch.

pabens@infosec.pub · 20 hours ago

Tell it to Joel

SugarCatDestroyer@lemmy.world · 17 hours ago

I’ll damn get dressed and go to court in slippers and underwear, even in hellish heat and in another country! I hope these bastards go bankrupt!

Viking_Hippie@lemmy.dbzer0.com · edit-2 1 day ago

AI industry, fucking around: Woo! This is awesome! No consequences ever! Just endless profits!

AII, finding out: this fucking sucks! So unfair!

thedruid@lemmy.world · 2 days ago

Hmm. I’m finding it hard to come up with more clever response to them than.

" good "

SeductiveTortoise@piefed.social · 2 days ago

Not clever, but shared

xxce2AAb@feddit.dk · 2 days ago

You had me at “financially ruin AI industry”.

floofloof@lemmy.ca · 2 days ago

If the appeals court denies the petition, Anthropic argued, the emerging company may be doomed. As Anthropic argued, it now “faces hundreds of billions of dollars in potential damages liability at trial in four months” based on a class certification rushed at “warp speed” that involves “up to seven million potential claimants, whose works span a century of publishing history,” each possibly triggering a $150,000 fine.

Maybe they should have thought of that before they ripped off a century’s worth of published literature?