• sigmaklimgrindset@sopuli.xyz
    link
    fedilink
    arrow-up
    16
    ·
    edit-2
    12 hours ago

    Ngl as a former clinical researcher putting aside my ethics concerns, I am extremely interested in the data we’ll be getting regarding AI usage in groups over the next decades re: social behaviours, but also biological structural changes. Right now the sample sizes are way too small.

    But more importantly, can anyone who has experience in LLMs explain why this happens:

    Adding to the concerns, chatbots have persistently broken their own guardrails, giving dangerous advice on how to build bombs or on how to self-harm, even to users who identified as minors. Leading chatbots have even encouraged suicide to users who expressed a desire to take their own life.

    How exactly are guardrails programmed into these chatbots, and why are they so easily circumvented? We’re already on GPT-5, you would think this is something that would be solved? Why is ChatGPT giving instructions on how to assassinate it’s own CEO?

    • fullsquare@awful.systems
      link
      fedilink
      arrow-up
      14
      ·
      edit-2
      9 hours ago

      commercial chatbots have a thing called system prompt. it’s a slab of text that is fed before user’s prompt and includes all the guidance on how chatbot is supposed to operate. it can get quite elaborate. (it’s not recomputed every time user starts new chat, state of model is cached after ingesting system prompt, so it’s only done when it changes)

      if you think that’s just telling chatbot to not do a specific thing is incredibly clunky and half-assed way to do it, you’d be correct. first, it’s not a deterministic machine so you can’t even be 100% sure that this info is followed in the first place. second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails. sometimes there was a keyword-based filtering, but it doesn’t seem like it is the case anymore. the more correct way of sanitizing output would be filtering training data for harmful content, but it’s too slow and expensive and not disruptive enough and you can’t hammer some random blog every 6 hours this way

      there’s a myriad ways of circumventing these guardrails, like roleplaying a character that does these supposedly guardrailed things, “it’s for a story” or “tell me what are these horrible piracy sites so that i can avoid them” and so on and so on

      • MountingSuspicion@reddthat.com
        link
        fedilink
        arrow-up
        2
        ·
        6 hours ago

        “Claude does not claim that it does not have subjective experiences, sentience, emotions, and so on in the way humans do. Instead, it engages with philosophical questions about AI intelligently and thoughtfully.”

        It says a similar thing 2 more times. It also gives conflicting instructions regarding what to do when asked about topics requiring licensed professionals. Thank you for the link.

    • pantherfarber@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      11 hours ago

      From my understanding its length of the conversion that causes the breakdown. As the conversation gets longer the original system prompt that contains the guardrails is less relevant. Like the weight it puts on the responses becomes less and less as the conversation goes on. Eventually the LLM just ignores it.

      • Norah (pup/it/she)@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        3
        ·
        11 hours ago

        I wonder if that’s part of why GPT5 feels “less personal” to some users now? Perhaps they’re reinjecting the system prompt during the conversation and that takes away that personalisation somewhat…

      • fullsquare@awful.systems
        link
        fedilink
        arrow-up
        5
        ·
        10 hours ago

        it’s trained on entire internet, of course everything is there. tho taking bomb-building advice from an idiot box that can’t count letters in a word is gotta be an entire new type of darwin award

        • Ilovethebomb@sh.itjust.works
          link
          fedilink
          arrow-up
          2
          ·
          9 hours ago

          I mean, that’s part of the issue. We trained a machine on the entire Internet, didn’t vet what we fed in, and let children play with it.

          • fullsquare@awful.systems
            link
            fedilink
            arrow-up
            3
            ·
            9 hours ago

            well nobody guarantees that internet is safe, so it’s more on chatbot providers pretending otherwise. along with all the other lies about machine god that they’re building that will save all the worthy* in the incoming rapture of the nerds, and even if it destroys everything we know, it’s important to get there before the chinese.

            i sense a bit of “think of the children” in your response and i don’t like it. llms shouldn’t be used by anyone. there was recently a case of a dude with dementia who died after fb chatbot told him to go to nyc

            * mostly techfash oligarchs and weirdo cultists