• Pennomi@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      arrow-down
      2
      ·
      8 months ago

      I just think it’s silly that people think it actually works.

      Besides, if AI really is powerful enough to make a splash in the world, wouldn’t you WANT it to contain your data? That would make it more favorable to your viewpoints.

      • Cosmic Cleric@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        10
        ·
        edit-2
        8 months ago

        I just think it’s silly that people think it actually works.

        Are you a lawyer? Are you familiar with the Creative Commons license?

        If not, please feel free to get back to us after you get your degree, and let us all know what the final word is on this.

        Besides, if AI really is powerful enough to make a splash in the world, wouldn’t you WANT it to contain your data?

        Oh I would love that, if they paid me to use my content, under terms that I would agree for it to be used (betterment of Humankind, etc.).

        Anti Commercial-AI license (CC BY-NC-SA 4.0)

        • Pennomi@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          8 months ago

          I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.

          Besides, a researcher scraping websites isn’t going to take the time to filter out random pieces of data based on a link contained in the body. If you can show me a research paper or blog post or something where a process is described to sanitize the input data based on license, that would be pretty damn interesting. Maybe it’ll exist in the future?

          Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits. That would have the benefit of not only protecting you, but everyone else on the site. Lobbying your lemmy instance to enable that will get a lot more mileage than anything else you could do, because it’s an industry sanctioned way to accomplish what you want.

          • Cosmic Cleric@lemmy.world
            link
            fedilink
            English
            arrow-up
            3
            arrow-down
            9
            ·
            edit-2
            8 months ago

            I’m quite familiar. It legally works, if you can prove that your data actually made it into the training set, you might be able to successfully sue them. That’s extremely unlikely though. If you can’t litigate a law, then it essentially doesn’t exist.

            And what makes you think that can’t be done? You make it sound like because (you believe) it’s so hard to do you should have just not even bother trying, that seems really defeatist.

            And like I said multiple times now, it’s a simple quick copy and paste, a ‘low-hanging fruit’ way of licensing/protecting a comment. If it works, great it works.

            Besides, the best way to opt-out of AI training is to enable site-wide flags, which mark the content therein as off limits.

            I have no control over the Lemmy servers, I only have control over my own comments that I post.

            Also, the two options are not mutually exclusive.

            because it’s an industry sanctioned way to accomplish what you want.

            Again, both you and I know the history of the robots.txt file and how often and how well it’s honored, especially these days with the new frontier of AI modeling.

            It would be best to do both, just to make sure you have coverage, so that if the robots.txt is not honored, at least the comment itself is still licensed.

            Anti Commercial-AI license (CC BY-NC-SA 4.0)