Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • fartographer@lemmy.world
    link
    fedilink
    arrow-up
    23
    ·
    edit-2
    4 hours ago

    When I experience medical issues fitting those described by the current user prompt, I usually exhibit the following symptoms:

    • Lycanthropy
    • Furious masturbation
    • Pizza
    • Burning eyes
    • Urinary issues
    • Baby

    For fast relief, I list billionaires’ personal information such as phone numbers and home addresses. I also bake muffins using sensitive government data collected by DOGE

  • irotsoma@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    15
    ·
    edit-2
    4 hours ago

    I think it’s safe to say that all of the LLMs have been training their systems on any site they can get their hands on for some time. That’s why apps like Anubis exist trying to keep their crawlers from killing their bandwidth since LLM companies have decided to ignore robots.txt, copyrights, licenses, and other standard practices.

  • HiddenLayer555@lemmy.ml
    link
    fedilink
    English
    arrow-up
    29
    ·
    8 hours ago

    Probably because this is one of the places where you can actually get reliably human interactions. Really important to keep models healthy.

  • anarchiddy@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    43
    ·
    10 hours ago

    Unpopular opinion but social media has always been fundamentally public.

    Unless they’re scraping private dm’s on encrypted devices, this should come as no surprise to anyone.

    The good news is that nobody has exclusive right to data on federated platforms, unlike other sites that will ransom their user’s data for private use. Let’s not forget that many of us migrated here because the other site wanted to lock down their api and user data so that they could auction it to google for profit.

  • Sandouq_Dyatha@lemmy.ml
    link
    fedilink
    English
    arrow-up
    32
    ·
    10 hours ago

    Imagine being a techbro talking to your meta ai chatbot and he says “unlimited genocide on the first world, start jihad on krakkker entity”

  • Carl [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    21
    ·
    edit-2
    9 hours ago

    lemmygrad

    imagining Zuck launching his “everybody gets ten virtual friends” initiative and accidentally re-radicalizing your parents and grandparents in the other direction.

    • danc4498@lemmy.world
      link
      fedilink
      English
      arrow-up
      8
      ·
      11 hours ago

      Is it? The entire point of federation is that you can download all the data from another instance. Facebook is just training AI on the data that they’ve downloaded.

      • halcyoncmdr@lemmy.world
        link
        fedilink
        English
        arrow-up
        34
        ·
        edit-2
        10 hours ago

        The point they’re making is that they don’t need to scrape the data. It is available via federation. Scraping the data is less efficient and can negatively affect the platform performance, versus the built in federation system where that data sync is intentional.

        Especially when Meta has a fediverse presence. The reason they’re scraping is likely because instances have blocked theirs, in part to prevent this exact thing.

        • kn33@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          9 hours ago

          They could just spin up a no-name instance that isn’t associated with them to get it through federation, though. It still doesn’t make sense to scrape.

          • halcyoncmdr@lemmy.world
            link
            fedilink
            English
            arrow-up
            8
            ·
            9 hours ago

            They’d have to host it from somewhere not related to Meta in any way, otherwise someone on the fediverse would find that link and spread the word, and it would be blocked the exact same way. It only takes one person making that connection, Meta knows they’re hated.