Suing Writers Seethe at OpenAI's Excuses in Court

floofloof@lemmy.ca · 2 年前

Suing Writers Seethe at OpenAI's Excuses in Court

makeasnek@lemmy.ml · edit-2 2 年前

Amazing how every new generation of technology has a generation of users of the previous technology who do whatever they can do stop its advancement. This technology takes human creativity and output to a whole new level, it will advance medicine and science in ways that are difficult to even imagine, it will provide personalized educational tutoring to every student regardless of income, and these people are worried about the technicality of what the AI is trained on and often don’t even understand enough about AI to even make an argument about it. If people like this win, whatever country’s legal system they win in will not see the benefits that AI can bring. That society is shooting themselves in the foot.

Your favorite musician listened to music that inspired them when they made their songs. Listening to other people’s music taught them how to make music. They paid for the music (or somebody did via licensing fees or it was freely available for some other reason) when they listened to it in the first place. When they sold records, they didn’t have to pay the artist of every song they ever listened to. That would be ludicrous. An AI shouldn’t have to pay you because it read your book and millions like it to learn how to read and write.

Allseer@futurology.today · 2 年前

You’re humanizing the software too much. Comparing software to human behavior is just plain wrong. GPT can’t even reason properly yet. I can’t see this as anything other than a more advanced collage process.

Open used intellectual property without consent of the owners. Major fucked.

If ‘anybody’ does anything similar to tracing, copy&pasting or even sampling a fraction of another person’s imagery or written work, that anybody is violating copyright.

hoshikarakitaridia@sh.itjust.works · 2 年前

sampling a fraction of another person’s imagery or written work.

So citing is a copyright violation? A scientific discussion on a specific text is a copyright violation? This makes no sense. It would mean your work couldn’t build on anything else, and that’s plain stupid.

Also to your first point about reasoning and advanced collage process: you are right and wrong. Yes an LLM doesn’t have the ability to use all the information a human has or be as precise, therefore it can’t reason the same way a human can. BUT, and that is a huge caveat, the inherit goal of AI and in its simplest form neural networks was to replicate human thinking. If you look at the brain and then at AIs, you will see how close the process is. It’s usually giving the AI an input, the AI tries to give the desired output, them the AI gets told what it should have looked like, and then it backpropagates to reinforce it’s process. This already pretty advanced and human-like (even look at how the brain is made up and then how AI models are made up, it’s basically the same concept).

Now you would be right to say “well in it’s simplest form LLMs like GPT are just predicting which character or word comes next” and you would be partially right. But in that process it incorporates all of the “knowledge” it got from it’s training sessions and a few valuable tricks to improve. The truth is, differences between a human brain and an AI are marginal, and it mostly boils down to efficiency and training time.

And to say that LLMs are just “an advanced collage process” is like saying “a car is just an advanced horse”. You’re not technically wrong but the description is really misleading if you look into the details.

And for details sake, this is what the paper for Llama2 looks like; the latest big LLM from Facebook that is said to be the current standard for LLM development:

https://arxiv.org/pdf/2307.09288.pdf

Tosti@feddit.nl · edit-2 2 年前

deleted by creator

makeasnek@lemmy.ml · edit-2 2 年前

No that’s not how it works. It stores learned information like “word x is more likely to follow word y than word a” or “people from country x are more likely to consume food a than b”. That is what is distributed when the AI model is shared. To learn that, it just reads books zillions of times and updates its table of likelihoods. Just like an artist might listen to a Lil Wayne album hundreds of times and each time they learn a little bit more about his rhyme style or how beats work or whatever. It’s more complicated than that, but that’s a layperson’s explanation of how it works. The book isn’t stored in there somewhere. The book’s contents aren’t transferred to other parties.

Madison_rogue@kbin.social · edit-2 2 年前

The learning model is artificial, vs a human that is sentient. If a human learns from a piece of work, that’s fine if they emulate styles in their own work. However, sample that work, and the original artist is due compensation. This was a huge deal in the late 80s with electronic music sampling earlier musical works, and there are several cases of copyright that back original owners’ claim of royalties due to them.

The lawsuits allege that the models used copyrighted work to learn. If that is so, writers are due compensation for their copyrighted work.

This isn’t litigation against the technology. It’s litigation around what a machine can freely use in its learning model. Had ChatGPT, Meta, etc., used works in the public domain this wouldn’t be an issue. Yet it looks as if they did not.

EDIT

And before someone mentions that the books may have been bought and then used in the model, it may not matter. The Birthday Song is a perfect example of copyright that caused several restaurant chains to use other tunes up until the copyright was overturned in 2016. Every time the AI uses the copied work in its’ output it may be subject to copyright.

Heratiki@lemmy.ml · 2 年前

The creator of ChatGPT is sentient. Why couldn’t it be said that this is their expression of the learned works?

Madison_rogue@kbin.social · 2 年前

https://crsreports.congress.gov/product/pdf/LSB/LSB10922

Heratiki@lemmy.ml · 2 年前

I’ve glanced at these a few times now and there are a lot of if ands and buts in there.

I’m not understanding how an AI itself infringes on the copyright as it has to be directed in its creation at this point (GPT specifically). How is that any different than me using a program that will find a specific piece of text and copy it for use in my own document. In that case the document would be presented by me and thus I would be infringing not the software. AI (for the time being) are simply software and incapable of infringement. And suing a company who makes the AI simply because they used data to train its software is not infringement as the works are not copied verbatim from their original source unless specifically requested by the user. That would put the infringement on the user.

Phanatik@kbin.social · 2 年前

There’s a bit more nuance to your example. The company is liable for building a tool that allows plagiarism to happen. That’s not down to how people are using it, that’s just what the tool does.

Heratiki@lemmy.ml · 2 年前

So a company that makes lock picking tools is liable for when a burglar uses them to steal? Or a car manufacturer is liable when some uses their car to kill? How about knives, guns, tools, chemicals, restraints, belts, rope, and I could go on and nearly use every single word in the English language yet none of those manufacturers can be sued for someone misusing their products. They’d have to show intent of maliciousness which I just don’t see is possible in the context they’re seeking.

Kichae@kbin.social · edit-2 2 年前

It’s litigation around what a machine can freely use in its learning model.

No, its not that, either. It’s litigation around what resources a person can exploit to develop a product without paying for that right.

The machine is doing nothing wrong. It’s not feeding itself.

LemmysMum@lemmy.world · 2 年前

I can read a copy written work and create a work from the experience and knowledge gained. At what point is what I’m doing any different to the A.I.?

mkhoury@lemmy.ca · 2 年前

For one thing: when you do it, you’re the only one that can express that experience and knowledge. When the AI does it, everyone an express that experience and knowledge. It’s kind of like the difference between artisanal and industrial. There’s a big difference of scale that has a great impact on the livelihood of the creators.

LemmysMum@lemmy.world · 2 年前

Yes, it’s wonderful. Knowledge might finally become free in the advent of AI tools and we might finally see the death of the copyright system. Oh how we can dream.

Phanatik@kbin.social · 2 年前

I’m not sure what you mean by this. Information has always been free if you look hard enough. With the advent of the internet, you’re able to connect with people who possess this information and you’re likely to find it for free on YouTube or other websites.

Copyright exists to protect against plagiarism or theft (in an ideal world). I understand the frustration that comes with archaic laws and that updates to laws move at a glacier’s pace, however, the death of copyright harms more people than you’re expecting.

Piracy has existed as long as the internet has. Companies have been complaining ceaselessly about lost profits but once LLMs came along, they’re fine with piracy if it’s been masked behind a glorified search algorithm. They’re fine with cutting jobs and replacing them with an LLM that produces less quality output at significantly cheaper rates.

LemmysMum@lemmy.world · 2 年前

Information has always been free if you look hard enough. With the advent of the internet, you’re able to connect with people who possess this information and you’re likely to find it for free on YouTube or other websites.

And with the advent of AI we no longer have to look hard.

BraveSirZaphod@kbin.social · edit-2 2 年前

There is a practical difference in the time required and sheer scale of output in the AI context that makes a very material difference on the actual societal impact, so it’s not unreasonable to consider treating it differently.

Set up a lemonade stand on a random street corner and you’ll probably be left alone unless you have a particularly Karen-dominated municipal government. Try to set up a thousand lemonade stands in every American city, and you’re probably going to start to attract some negative attention. The scale of an activity is a relevant factor in how society views it.

Phanatik@kbin.social · 2 年前

For one thing, you can do the task completely unprompted. The LLM has to be told what to do. On that front, you have an idea in your head of the task you want to achieve and how you want to go about doing it, the output is unique because it’s determined by your perceptions. The LLM doesn’t really have perceptions, it has probabilities. It’s broken down the outputs of human creativity into numbers and is attempting to replicate them.

LemmysMum@lemmy.world · edit-2 2 年前

The ai does have perceptions, fed into by us as inputs. I give the ai my perceptions, the ai creates a facsimile, and I adjust the perceptions I feed into the ai until I receive an output that meets the needs of my requirements, no different from doing it myself except I didn’t need to read all the books, and learn all the lessons myself. I still tailor the end product, just not to the same micro scale that we needed to traditionally.

Phanatik@kbin.social · 2 年前

You can’t feed it perceptions no more than you can feed me your perceptions. You give it text and the quality of the output is determined by how the LLM has been trained to understand that text. If by feeding it perceptions, you mean by what it’s trained on, I have to remind you that the reality GPT is trained on is the one dictated by the internet with all of its biases. The internet is not a reflection of reality, it’s how many people escape from reality and share information. It’s highly subject to survivorship bias. If the information doesn’t appear on the internet, GPT is unaware of it.

To give an example, if GPT gives you a bad output and you tell it that it’s a bad output, it will apologise. This seems smart but it’s not really. It doesn’t actually feel remorse, it’s giving a predetermined response based on what it’s understood by your text.

LemmysMum@lemmy.world · edit-2 2 年前

We’re not talking about perceptions as in making an AI literally perceive anything. I can feed you prompts and ideas of my own and get an output no different than if I was using AI tools, the difference being ai tools have already gathered the collective knowledge you’d get from say doing a course in photoshop, taking an art class, reading an encyclopaedia or a novel, going to school for music theory, etc.

Dudewitbow@lemmy.ml · 2 年前

Its less about copying the work, its more like looking at patterns that appear in a work.

To bring a very rudimentary example, if I wanted a word and the first letter was Q, what would the second letter be.

Of course, statistically, the next letter is u, and its not common for words starting with Q to have a different letter after that. ML/AI is like taking these small situations, but having a ridiculous amount of parameters to come up with something based on several internal models. These paramters of course generally have some context.

Its like if you were told to read a book thoroughly, and then after was told to reproduce the same book. You probably cannot make it 1:1, but could probably get the general gist of a story. The difference between you and the machine is the machine read a lot of books, and contextually knows patterns so that it can generate something similar faster and more accurate, but not exactly the original one for one thing.

mkhoury@lemmy.ca · 2 年前

I don’t think that Sarah Silverman and the others are saying that the tech shouldn’t exist. They’re saying that the input to train them needs to be negotiated as a society. And the businesses also care about the input to train them because it affects the performance of the LLMs. If we do allow licensing, watermarking, data cleanup, synthetic data, etc. in a way that is transparent, I think it’s good for the industry and it’s good for the people.

Dr Cog@mander.xyz · 2 年前

I don’t need to negotiate with Sarah Silverman if Im handed her book by a friend, and neither should an AI

Noved@lemmy.ca · 2 年前

But you do need to negotiate with Sarah Silverman, if you take that book, rearrange the chapters, and then try sell it for profit. Obviously that’s extremified but it’s The argument they’re making.

Dr Cog@mander.xyz · 2 年前

I agree. But that isn’t what AI is doing, because it doesn’t store the actual book and it isn’t possible to reproduce any part in a format that is recognizable as the original work.

Heratiki@lemmy.ml · 2 年前

That’s not what this is. To use your example it would be like taking her book and rearranging ALL of the words to make another book and selling that book. But they’re not selling the book or its contents, they’re selling how their software interprets the book for the benefit of the user. This would be like suing teachers for teaching about their book.

iegod@lemm.ee · 2 年前

Definitely not how that output works. It will come up with something that seems like a Sarah Silverman created work but isn’t. It’s like calling Copyright on impersonations. I don’t buy it

Heratiki@lemmy.ml · 2 年前

Yes. Imagine how much trouble ANY actor would be in if they were sued for impersonating someone nearly identical but not that person. If Sarah Silverman ever interacted with a person and then imitated that person on stage for her own personal benefit without the other persons express consent it would be no different. And comedians pick up their comedy from everything around them both natural and imitation.

iegod@lemm.ee · 2 年前

100%. I just can’t get behind any of these arguments against AI from this segment of workers. This is no different than other rallies against technological evolution due to fear of job losses. Their scarce commodity will soon disappear and that’s what they’re actually afraid of.

Heratiki@lemmy.ml · 2 年前

It’s easy. They’re grasping at straws because their career isn’t what it used to be. It’s something new and viral so it must be an easy target to exploit for money. Personally I’d be on top of it and setting up contracts to allow AI to use my likeness for a small subset of the usual pay. I just can’t imagine not taking advantage of the ability to do absolutely nothing and still get paid for it. Instead they appear to actively be trying to tear it down. If they were wanting to set guidelines then they would be rallying congress not suing a company based on how you FEEL it should be.

ag_roberston_author@beehaw.org · 2 年前

An LLM isn’t human and shouldn’t be treated the same as a human. It’s as foolish as corporate personhood.

Dr Cog@mander.xyz · 2 年前

The argument is less that an LLM is a human and more that it is not a copyright violation to use a material to train the LLM. By current legal definitions, it is fair use unless the material is able to be reproduced in its entirety (or at least, in some meaningful way).

ag_roberston_author@beehaw.org · 2 年前

By current legal definitions

Yeah, definitions that were written before this technology existed. I don’t base my opinions on what is legal, legality nothing more than rules determined by those in power.

Instead, I base them on what is ethical, and the consumption of material by LLMs and other AIs without the express permission of its creator is unethical.

Madison_rogue@kbin.social · 2 年前

Except the AI owner does. It’s like sampling music for a remix or integrating that sample into a new work. Yes, you do not need to negotiate with Sarah Silverman if you are handed a book by a friend. However if you use material from that book in a work it needs to be cited. If you create an IP based off that work, Sarah Silverman deserves compensation because you used material from her work.

No different with AI. If the AI used intellectual property from an author in its learning algorithm, than if that intellectual property is used in the AI’s output the original author is due compensation under certain circumstances.

Dr Cog@mander.xyz · edit-2 2 年前

Neither citation nor compensation are necessary for fair use, which is what occurs when an original work is used for its concepts but not reproduced.

SheeEttin@lemmy.world · 2 年前

Sure, but fair use is rather narrowly defined. You must consider the purpose, nature, amount, and effect. In the case of scraping entire bodies of work as training data, the purpose is commercial, the nature is not in the public interest, the amount is the work in its entirety, and the effect is to compete with the original author. It fails to meet any criteria for fair use.

Dr Cog@mander.xyz · 2 年前

The work is not reproduced in its entirety. Simply using the work in its entirety is not a violation of copyright law, just as reading a book or watching a movie (even if pirated) is not a violation. The reproduction of that work is the violation, and LLMs simply do not store the works in their entirety nor are they capable of reproducing them.

SheeEttin@lemmy.world · 2 年前

It doesn’t have to be reproduced to be a copyright violation, only used. For example, publishing your Harry Potter fanfic would be infringement. You’re not reproducing the original material in any way, but you’re still heavily depending on it.

Electricblush@lemmy.world · 2 年前

Breach of trademark, not copyright, whole different barrel of fish.

iegod@lemm.ee · 2 年前

It is different. That knowledge from her book forms part of your processing and allows you to extract features and implement similar outputs yourself. The key difference between the AI module and dataset is that it’s codified in bits, versus whatever neural links we have in our brain. So if one theoretically creates a way to codify your neural network you might be subject to the same restrictions we’re trying to levy on ai. And that’s bullshit.

HubertManne@kbin.social · 2 年前

its a bit more than that if the ai is told to make something in the style of.

andruid@lemmy.ml · 2 年前

I mean people have doing new works in the style of other artists for a while as well.

HubertManne@kbin.social · 2 年前

yeah again they can’t crank out a new one every 5 minutes and actually it would overwhelm the courts as its very easy for those works to be to similar. take the guy who tried to sue disney by writing a book based on finding nemo when he found out they were making a story like that. He was shady and tried to play timeline games but he did not need to make a story just like it.

rgb3x3@beehaw.org · 2 年前

Yeah, and a person could make something in the style of someone else. And it would only be copyright infringement if the work does not meaningfully change the original and give credit to the original artist.

How is this any different?

HubertManne@kbin.social · 2 年前

mainly because its just to easy. We should limit time periods for ip but while its in force it should not be able to be used by ai to me. Keep ip to 20 years and let ai have it at that point.

Franzia@lemmy.blahaj.zone · 2 年前

Amazing how every generation of technology has an asshole billionaire or two stealing shit to be the first in line to try and monopolize society’s progress.

ag_roberston_author@beehaw.org · 2 年前

This technology takes human creativity and output to a whole new level,

No, it doesn’t. There’s nothing “human” or “creative” about the output of AI.