

I think provenance has value outside copyright… here’s a hypothetical scenario:
libsomeshit is licensed under MIT-0 . It does not even need attribution. Version 3.0 has introduced a security exploit. It has been fixed in version 6.23 and widely reported.
A plagiaristic LLM with training date cutoff before 6.23 can just shit out the exploit in question, even though it already has been fixed.
A less plagiaristic LLM could RAG in the current version of libsomeshit and perhaps avoid introducing the exploit and update the BOM with a reference to “libsomeshit 6.23” so that when version 6.934 fixes some other big bad exploit an automated tool could raise an alarm.
Better yet, it could actually add a proper dependency instead of cut and pasting things.
And it would not need to store libsomeshit inside its weights (which is extremely expensive) at the same fidelity. It just needs to be able to shit out a vector database’s key.
I think the market right now is far too distorted by idiots with money trying to build the robot god. Code plagiarism is an integral part of it, because it makes the LLM appear closer to singularity (it can write code for itself! it is gonna recursively self-improve!).
Its fucking disgusting how they denigrate the very work on which they built their fucking business on. I think its a mixture of the two though, they want it plagiarized so that it looks like their bot is doing more coding than it is actually capable of.
Oh absolutely. My current project is sitting in a private git repo, hosted on a VPS. And no fucking way will I share it under anything less than GPL3 .
We need a license with specific AI verbiage. Forbidding training outright won’t work (they just claim fair use).
I was thinking adding a requirement that the license header should not be removed unless a specific string (“This code was adapted from libsomeshit_6.23”) is included in the comments by the tool, for the purpose of propagation of security fixes and supporting a consulting market for the authors. In the US they do own the judges, but in the rest of the world the minuscule alleged benefit of not attributing would be weighted against harm to their customers (security fixes not propagated) and harm to the authors (missing out on consulting gigs).
edit: perhaps even an explainer that authors see non attribution as fundamentally fraudulent against the user of the coding tool: the authors of libsomeshit routinely publish security fixes and the user of the coding tool, who has been defrauded to believe that the code was created de-novo by the coding tool, is likely to suffer harm from misuse of published security fixes by hackers (which wouldn’t be possible if the code was in fact created de-novo).