• Pennomi@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    ·
    8 months ago

    I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.

    Besides, a researcher scraping websites isn’t going to take the time to filter out random pieces of data based on a link contained in the body. If you can show me a research paper or blog post or something where a process is described to sanitize the input data based on license, that would be pretty damn interesting. Maybe it’ll exist in the future?

    Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits. That would have the benefit of not only protecting you, but everyone else on the site. Lobbying your lemmy instance to enable that will get a lot more mileage than anything else you could do, because it’s an industry sanctioned way to accomplish what you want.

    • Cosmic Cleric@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      9
      ·
      edit-2
      8 months ago

      I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.

      And what makes you think that can’t be done? You make it sound like because (you believe) it’s so hard to do you should have just not even bother trying, that seems really defeatist.

      And like I said multiple times now, it’s a simple quick copy and paste, a ‘low-hanging fruit’ way of licensing/protecting a comment. If it works, great it works.

      Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits.

      I have no control over the Lemmy servers, I only have control over my own comments that I post.

      Also, the two options are not mutually exclusive.

      because it’s an industry sanctioned way to accomplish what you want.

      Again, both you and I know the history of the robots.txt file and how often and how well it’s honored, especially these days with the new frontier of AI modeling.

      It would be best to do both, just to make sure you have coverage, so that if the robots.txt is not honored, at least the comment itself is still licensed.

      Anti Commercial-AI license (CC BY-NC-SA 4.0)