An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates. Experts say the problem is bigger than that
The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.
An AI leaderboard suggests the newest reasoning models used in chatbots are producing less accurate results because of higher hallucination rates.
So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.
AI chatbots from tech companies such as OpenAI and Google have been getting so-called reasoning upgrades over the past months
When are people going to accept the fact that large “language” models are not general intelligence?
ideally to make them better at giving us answers we can trust
Those models are useful, but only a fool trusts = is gullible towards their output.
OpenAI says the reasoning process isn’t to blame.
Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.
This is sounding more and more like model collapse - models perform worse when trained on the output of other models.
inb4 sealions asking what’s my definition of reasoning in 3…2…1…
The goalpost has shifted a lot in the past few years, but in the broader and even narrower definition, current language models are precisely what was meant by AI and generally fall into that category of computer program. They aren’t broad / general AI, but definitely narrow / weak AI systems.
I get that it’s trendy to shit on LLMs, often for good reason, but that should not mean we just redefine terms because some system doesn’t fit our idealized under-informed definition of a technical term.
The whole thing can be summed up as the following: they’re selling you a hammer and telling you to use it with screws. Once you hammer the screw, it trashes the wood really bad. Then they’re calling the wood trashing “hallucination”, and promising you better hammers that won’t do this. Except a hammer is not a tool to use with screws dammit, you should be using a screwdriver.
So he’s suggesting that the models are producing less accurate results… because they have higher rates of less accurate results? This is a tautological pseudo-explanation.
When are people going to accept the fact that large “language” models are not general intelligence?
Those models are useful, but only a fool trusts = is gullible towards their output.
Just like my dog isn’t to blame for the holes in my garden. Because I don’t have a dog.
This is sounding more and more like model collapse - models perform worse when trained on the output of other models.
inb4 sealions asking what’s my definition of reasoning in 3…2…1…
ai is just too nifty word even if its gross misuse of the term. large language model doesnt roll of the tongue as easily.
The goalpost has shifted a lot in the past few years, but in the broader and even narrower definition, current language models are precisely what was meant by AI and generally fall into that category of computer program. They aren’t broad / general AI, but definitely narrow / weak AI systems.
I get that it’s trendy to shit on LLMs, often for good reason, but that should not mean we just redefine terms because some system doesn’t fit our idealized under-informed definition of a technical term.
well, i guess i can stop feeling like i’m using wrong word for them then