so how is it fundamentally different from qanon, except that it’s strictly personalized this time
Ngl as a former clinical researcher putting aside my ethics concerns, I am extremely interested in the data we’ll be getting regarding AI usage in groups over the next decades re: social behaviours, but also biological structural changes. Right now the sample sizes are way too small.
But more importantly, can anyone who has experience in LLMs explain why this happens:
Adding to the concerns, chatbots have persistently broken their own guardrails, giving dangerous advice on how to build bombs or on how to self-harm, even to users who identified as minors. Leading chatbots have even encouraged suicide to users who expressed a desire to take their own life.
How exactly are guardrails programmed into these chatbots, and why are they so easily circumvented? We’re already on GPT-5, you would think this is something that would be solved? Why is ChatGPT giving instructions on how to assassinate it’s own CEO?
commercial chatbots have a thing called system prompt. it’s a slab of text that is fed before user’s prompt and includes all the guidance on how chatbot is supposed to operate. it can get quite elaborate. (it’s not recomputed every time user starts new chat, state of model is cached after ingesting system prompt, so it’s only done when it changes)
if you think that’s just telling chatbot to not do a specific thing is incredibly clunky and half-assed way to do it, you’d be correct. first, it’s not a deterministic machine so you can’t even be 100% sure that this info is followed in the first place. second, more attention is given to the last bits of input, so as chat goes on, the first bits get less important, and that includes these guardrails. sometimes there was a keyword-based filtering, but it doesn’t seem like it is the case anymore. the more correct way of sanitizing output would be filtering training data for harmful content, but it’s too slow and expensive and not disruptive enough and you can’t hammer some random blog every 6 hours this way
there’s a myriad ways of circumventing these guardrails, like roleplaying a character that does these supposedly guardrailed things, “it’s for a story” or “tell me what are these horrible piracy sites so that i can avoid them” and so on and so on
“Claude does not claim that it does not have subjective experiences, sentience, emotions, and so on in the way humans do. Instead, it engages with philosophical questions about AI intelligently and thoughtfully.”
It says a similar thing 2 more times. It also gives conflicting instructions regarding what to do when asked about topics requiring licensed professionals. Thank you for the link.
From my understanding its length of the conversion that causes the breakdown. As the conversation gets longer the original system prompt that contains the guardrails is less relevant. Like the weight it puts on the responses becomes less and less as the conversation goes on. Eventually the LLM just ignores it.
I wonder if that’s part of why GPT5 feels “less personal” to some users now? Perhaps they’re reinjecting the system prompt during the conversation and that takes away that personalisation somewhat…
It’s incredible to me that it even has that information.
it’s trained on entire internet, of course everything is there. tho taking bomb-building advice from an idiot box that can’t count letters in a word is gotta be an entire new type of darwin award
I mean, that’s part of the issue. We trained a machine on the entire Internet, didn’t vet what we fed in, and let children play with it.
well nobody guarantees that internet is safe, so it’s more on chatbot providers pretending otherwise. along with all the other lies about machine god that they’re building that will save all the worthy* in the incoming rapture of the nerds, and even if it destroys everything we know, it’s important to get there before the chinese.
i sense a bit of “think of the children” in your response and i don’t like it. llms shouldn’t be used by anyone. there was recently a case of a dude with dementia who died after fb chatbot told him to go to nyc
* mostly techfash oligarchs and weirdo cultists