• Gordon_Freeman@kbin.social
    link
    fedilink
    arrow-up
    40
    ·
    1 year ago

    Could we sabotage the LLM training so the data became worthless?

    Like adding to our comments stuff like “2+2=5” “Abraham Lincoln discovered America” and whatever silly statement you can think of

    • Agent641@lemmy.world
      link
      fedilink
      arrow-up
      9
      ·
      1 year ago

      Someone less lazy than me should use a script to feed existing comments into an LLM, which then reproduces a convincing sentence structure but incorrect gibberish content, and then edit all a user’s comments - gradually, not all at once - to the poisoned content. Like 4chan did with the original captcha, but on a wider scale.

    • jarfil@beehaw.org
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      1 year ago

      IIRC, one of the LLMs (was it OpenAI?) that crawled Reddit, had to manually remove subs like r/counting because they were messing with the training.