Fixing Hallucinations Would Destroy ChatGPT, Expert Finds

ThefuzzyFurryComrade@pawb.social · 7 hours ago

Fixing Hallucinations Would Destroy ChatGPT, Expert Finds

DandomRude@lemmy.world · edit-2 2 hours ago

Apart from the fact that these hallucinations just cannot be fixed, that doesn’t even seem to be the only major problem atm: ChatGPT 5, for example, often seems to live in the past and is regularly unable to assess reliability when a question needs to be answered based on current data.

For example, when I ask who the US president is, I regularly get the answer that it is Joe Biden. When I ask who the current German chancellor is, I get the answer that it is Olaf Scholz. This raises the question of what LLMs can be used for if they cannot even answer these very basic questions correctly.

The error rate simply seems far too high for use by the general public—and that’s without even considering hallucinations, but simply based on answers to questions that are based on outdated or unreliable data.

And that, in my opinion, is the fundamental problem that also causes LLMs to hallucinate: They are unable to understand either the question or their own output—it is merely a probability calculation based on repetitive patterns—but LLMs are fundamentally incapable of understanding the logic behind these patterns; they only recognize the pattern itself, but not at all the underlying logic of the word order in a sentence. So they do not have a concept of right or wrong but only a statistical model based on the sequence of words in sentences—the meaning of a sentence cannot be captured fully in this way, which is why LLMs can only somewhat deal with sarcasm, for example, if the majority of sarcastic sentences in their training data have /s written after them so that this can be interpreted as an indicator for sarcasm (this way they can at least identify a sarcastic question if it contains/s).

Of course, this does not mean that there are no use cases for LLMs, but it does show how excessively oversold AI is.

BlameTheAntifa@lemmy.world · 3 hours ago

Literally everything a generative AI outputs is a hallucination. It is a hallucination machine.

Still, I like this fix. Let us erase AI.

Cevilia (she/they/…)@lemmy.blahaj.zone · 3 hours ago

You can’t “fix” hallucinations. They’re literally how it works.

ignirtoq@fedia.io · 3 hours ago

“Users accustomed to receiving confident answers to virtually any question would likely abandon such systems rapidly,” the researcher wrote.

While there are “established methods for quantifying uncertainty,” AI models could end up requiring “significantly more computation than today’s approach,” he argued, “as they must evaluate multiple possible responses and estimate confidence levels.”

“For a system processing millions of queries daily, this translates to dramatically higher operational costs,” Xing wrote.

They already require substantially more computation than search engines.
They already cost substantially more than search engines.
Their hallucinations make them unusable for any application beyond novelty.

If removing hallucinations means Joe Shmoe isn’t interested in asking it questions a search engine could already answer, but it brings even 1% of the capability promised by all the hype, they would finally actually have a product. The good long-term business move is absolutely to remove hallucinations and add uncertainty. Let’s see if any of then actually do it.

DreamlandLividity@lemmy.world · 2 hours ago

They probably would if they could. But removing hallucinations would remove the entire AI. The AI is not capable of anything other than hallucinations that are sometimes correct. They also can’t give confidence, because that would be hallucinated too.

theunknownmuncher@lemmy.world · edit-2 4 hours ago

I know everyone wants to be like “ha ha told you so!” and hate on AI in here, but this headline is just clickbait.

Current AI models have been trained to give a response to the prompt regardless of confidence, causing the vast majority of hallucinations. By incorporating confidence into the training and responding with “I don’t know”, similar to training for refusals, you can mitigate hallucinations without negatively impacting the model.

If you read the article, you’ll find the “destruction of ChatGPT” claim is actually nothing more than the “expert” making the assumption that users will just stop using AI if it starts occasionally telling users “I don’t know”, not any kind of technical limitation preventing hallucinations from being solved, in fact the “expert” is agreeing that hallucinations can be solved.

paraphrand@lemmy.world · edit-2 5 hours ago

This is what I hang my assumptions on them not reaching AGI on. It’s why hearing them raise money on wild hype is annoying.

Unless they come up with a totally new foundational approach. Also, I’m not saying current models are useless.

CubitOom@infosec.pub · 6 hours ago

Maybe the real intelligence was the hallucinations we made along the way

This is fine🔥🐶☕🔥@lemmy.world · 3 hours ago

Feed us LSD

fartographer@lemmy.world · 6 hours ago

You’re absolutely right, Steven! Hallucinations can be valuable to simulate intelligence and sound more human like you do, Stephanie. If you wanted to reduce hallucinations, Tiffany—you should reduce your recreational drug use, Tim!

Corelli_III@midwest.social · 6 hours ago

how many times are these guys going to release a paper just because one of them thought to look up “stochastic”

fucking bonkers, imagine thinking this is productive

Steve@startrek.website · 6 hours ago

Its a win-win

primrosepathspeedrun@anarchist.nexus · 4 hours ago

If anyone cared what experts think, we would not be here.

floo@retrolemmy.com · 6 hours ago

Sounds like a solid plan

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · edit-2 6 hours ago

Bull fucking shit. If a college kid can train it to not just make up new and novel information and actually pass on real shit for just a single school project, I don’t know how the big companies can’t do the same.

Thorry@feddit.org · 2 hours ago

Then you don’t understand how a modern LLM based ‘AI’ functions and I don’t blame you. It’s extra confusing because putting in data to the thing is called ‘training’ and the marketing materials say it’s artificial intelligence. So why can’t we just train our artificial intelligence to do better?

Well first of all, because an LLM isn’t intelligent at all. That’s just a term we use, that’s applied to a lot of stuff. A few lines of code in a video game so an enemy avoids your shots is called AI. A simple decision tree based system is called AI. A lot of things have the term AI slapped on, which aren’t intelligent at all. The same applies to LLM based chat bots, they get called AI but contain no form of intelligence inside.

So what is an LLM exactly then? Simply put it’s a machine that predicts the next word based on the words that came before. It’s been fed a whole bunch of text from the internet, books and any other source they could get their hands on. With this data they create a model, which given a bunch of words poops out what the next word would most likely be. In practice there’s a lot more to it, but this is the core of the thing. And what we learned was if you create a model large enough, you can feed it a lot of text and it will happily supply a bunch of text that follows. Putting this in a chat format, you can ask it a question and it will give an answer.

So the name LLM stands for large language model, like I said it’s a large model, which means it has been trained on a lot of data and knows the relation between a lot of words. The language part is because the model is specifically for natural language. It’s source data is natural language and the output is natural language. The training is the part where they feed it all of the data.

OK, why does this then mean we can’t train it not to lie anymore? Because the core of the system is predicting which words come next. The LLM system doesn’t know what the meaning of words are, it doesn’t understand anything. It’s just putting together a jigsaw puzzle and slotting in the pieces where they fit. It generates text because it’s internal calculations result in those words being likely based on the previous words. So when asked a question, it will most likely return a properly formatted and grammatically correct answer. There is however no relation between the answer and the truth. It literally hallucinates every answer it gives and because all the source data that was put into it contained hopefully a lot of truths, the answer has a chance of also being true. But it has a chance of not being true as well and if the source data didn’t contain something similar enough to the question, all bets are off and the answer has a high likelihood of not being true.

So what to do to fix it? Early on it was thought to increase the amount of data put into the model and increase the amount of resources the model can use. So let it “know” more and feed it more data. This helps to avoid the questions not being in the source data, or the model not recognizing the question as similar enough. So it should help reduce the wrong outputs right? Alas it turned out not to be. This helps a little bit, but the amount of effort gets exponentially greater and the results only get mildly better. More source data also meant more noise in the data, more truthful answers to a question, but also more false answers. It turned out especially when the model was fed output from earlier models, this messes up the end result.

To get it to behave properly, one would have to feed an infinite amount of data into it. And that data simply isn’t there. All of the good quality data has already been collected and put into it. So this is about as good as it gets. AI companies are going the pump in more resources route, but they are fast running into diminishing returns.

This is a really short and simplified explanation. There is a lot more to it and people are making entire careers in this field. But the core principle is solid. These systems only put in words that seem to fit, true or not. This is the fundamental functionality of the system, so it will always be prone to hallucinations.

So when AI companies tell in their marketing: “Just look at where we were a few years ago and where we are now, imagine where we will be in a couple of years!”. Hopefully you now know to take this with a lot of doubt. They are running into hard limits. Infinite growth isn’t a thing and past results are not a good indication of future results. They need a really big breakthrough, otherwise this technology will mostly fail.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 2 hours ago

I literally based what I said on seeing papers and video essays by students using generative AI to perform specific tasks, and overcoming this issue. It’s not just about the data it is “learning” but also about how you “reward” it for doing what you intend it to do. Let it figure out how to win at a game, and it will cheat until you start limiting how it is allowed to win.