Looking for a specific research illustration on how LLMs "do maths"

kazerniel@lemmy.world · 6 hours ago

Looking for a specific research illustration on how LLMs "do maths"

hendrik@palaver.p3x.de · edit-2 3 hours ago

That’s not entirely correct. They kinda “do maths”. I tried to google OP’s answer, and there’s a bunch of papers showcasing how LLMs develop circuits to handle numbers. (I didn’t find that specific one, though.) Of course everything is prediction with LLMs. But seems they try to form a model how to do base-10 maths. Surely they’re bad at it and not a real calculator. And you’re right. What people usually tend to do is give them tool access. Either to a proper calculator, or more often a Python sandbox and there will be a prompt to write a Python snippet to do arithmetic. But the usual models can also add and multiply smaller numbers without anything in the background. That’s not really an achievement, they can simply memorize the basic multiplication tables.

James R Kirk@startrek.website · 3 hours ago

Correct me if I’m wrong but what you’re describing still sounds like a probabilistic output, right? Meaning it’s not the same output every time (meaning it can’t actually be doing math).

hendrik@palaver.p3x.de · edit-2 36 minutes ago

The randomization comes into effect later. The model weights itself don’t change, that’s just some numbers and they get multiplied. So it will always be exactly 94% certain that 5 times 12 equals 60. (Numbers entirely made up by me. And I’m oversimplifying.)

I think what you mean is the sampler and for example the temperature setting. That is added on top and switches things up and occasionally makes the LLM output something that’s not the highest confidence token. And you’re right. Cranking up temperature would make it output more random answers. But if you use for example ChatGPT on default settings, it should almost always give the correct answer to very basic arithmetic questions with low-ish numbers. I’ve never seen it do anything else. And you can always set temperature to zero and the sampler will give you deterministic output. So always the same for same input.

But with that said, I also tried decimal numbers, large values or proper equations and trigonometry or divisions. And ChatGPT will definitely not be able to give reliable answers. It’s kind of surprising (to me) that it sometimes seems to pull it off, or at least have some vague idea where to go. But seems to me, elementary school level is the limit.

What the papers say is that there’s more inside. So they don’t just memorize or resort to random guessing. But there’s actually more inside. But I’ve just skimmed those papers, so I don’t know the exact details, seems they try to form some “understanding” of how addition works. We know they’re not specifically made to be calculators and from my experience they’re not good at it. But they’re not always just rolling dice either.

(And transformers-based large language models (plus added memory) are Turing complete so… theoretically they could be an accurate calculator 😂 just an absurdly idiotic and wasteful one…)

Ultimately all of this is hard to compare to how a human does maths. I also memorized my multiplication tables, but other than that I do several steps in my head, pretty much how I learned it in school. A LLM not so much, we’d have to properly read the papers to find out how they do it, but it probably inferred different ways to give answers… Unless we’re talking about the “reasoning” modes, but I don’t think they do proper reasoning as of today.