Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

cm0002@lemmy.world · 2 months ago

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought

Imgonnatrythis@sh.itjust.works · 2 months ago

“Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95,” the MIT article explains."

That is precisrly how I do math. Feel a little targeted that they called this odd.

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 2 months ago

I use a calculator. Which an AI should also be and not need to do weird shit to do math.

Jakeroxs@sh.itjust.works · 2 months ago

Function calling is a thing chatbots can do now

sapetoku@sh.itjust.works · 2 months ago

A regular AI should use a calculator subroutine, not try to discover basic math every time it’s asked something.

Goretantath@lemm.ee · 2 months ago

Yes, you shove it off onto another to do for you instead of doing it yourself and the ai doesnt.

Imgonnatrythis@sh.itjust.works · 2 months ago

Fascist. If someone does maths differently than your preference, it’s not “weird shit”. I’m facile with mental math despite what’s perhaps a non-standard approach, and it’s quite functional to be able to perform simple to moderate levels of mathematics mentally without relying on a calculator.

radau@lemmy.dbzer0.com · 2 months ago

Wtf hahahahaha

🇰 🌀 🇱 🇦 🇳 🇦 🇰 🇮 @pawb.social · 2 months ago

I am talking about the AI. It’s already a computer. It shouldn’t need to do anything other than calculate the equations. It doesn’t have a brain, it doesn’t think like a human, so it shouldn’t need any special tools or ways to help it do math. It is a calculator, after all.

artichoke99@lemm.ee · 2 months ago

OK but the llm is evidently shit at math so its “non-standard” approach should still be adjusted

Lemminary@lemmy.world · 2 months ago

Fascist

Wat

Imgonnatrythis@sh.itjust.works · 2 months ago

Thought police mate. You don’t tell people the way they think is weird shit just because they think differently than you. Break free from that path.

Lemminary@lemmy.world · edit-2 2 months ago

The reply was literally “*I* use a calculator” followed by “AI should use one too”. Are you suggesting that you’re an LLM or how did you cut a piece of cloth for yourself out of that?

GSV_Sleeper_Service@lemmy.world · 2 months ago

Calling someone a fascist for that is obviously a bit OTT but you’ve ignored the “do weird shit” part of the response so it wasn’t literally what you said. Taking the full response into account you can easily interpret it as “I don’t bother with mental maths but use a calculator instead, anyone who isn’t like me is weird as shit”

That is a bit thought police-y

Lemminary@lemmy.world · edit-2 2 months ago

I didn’t ignore it, I just interpret it differently as in, “I don’t need to do this unusual stuff everyone does without a calculator”. Calling something weird doesn’t necessarily mean it’s off-color or that it’s a trait the other person has. In my use case, weird just means unexpected or counterintuitive, and maybe complex enough that I can’t bother with describing it properly. I know because I use it that way too. Weird doesn’t have to mean a third eye on your face every time. I mean, doing the weird math thing is taught in school as a strategy.

I do want to mention that it’s not the first time I see a visceral reaction to a passing comment. I usually see this from marginalized groups, and I can assure you, both Kolanki and I are part of those too. And knowing his long comment history, I sincerely doubt he meant anyone is weird as shit.

And even if it’s a bit thought-policey, how does that warrant calling someone a fascist and going off on them like that? That’s also a bit weird (as in odd).

ClamDrinker@lemmy.world · 2 months ago

Except as you demonstrated, it requires quite a few leaps of interpretation, assuming the worst interpretations of OP’s statement, which is why it’s silly. OP clearly limited their statement to themselves and AI.

Now if OP said, “everyone should use a calculator or die”, maybe then it would have been a valid response.

OozingPositron@feddit.cl · 2 months ago

JayGray91@lemmy.zip · 2 months ago

I think it’s odd in the sense that it’s supposed to be software so it should already know what 36 plus 59 is in a picosecond, instead of doing mental arithmetics like we do

At least that’s my takeaway

shawn1122@lemm.ee · edit-2 2 months ago

This is what the ARC-AGI test by Chollet has also revealed of current AI / LLMs. They have a tendency to approach problems with this trial and error method and can be extremely inefficient (in their current form) with anything involving abstract / deductive reasoning.

Most LLMs do terribly at the test with the most recent breakthrough being with reasoning models. But even the reasoning models struggle.

ARC-AGI is simple, but it demands a keen sense of perception and, in some sense, judgment. It consists of a series of incomplete grids that the test-taker must color in based on the rules they deduce from a few examples; one might, for instance, see a sequence of images and observe that a blue tile is always surrounded by orange tiles, then complete the next picture accordingly. It’s not so different from paint by numbers.

The test has long seemed intractable to major AI companies. GPT-4, which OpenAI boasted in 2023 had “advanced reasoning capabilities,” didn’t do much better than the zero percent earned by its predecessor. A year later, GPT-4o, which the start-up marketed as displaying “text, reasoning, and coding intelligence,” achieved only 5 percent. Gemini 1.5 and Claude 3.7, flagship models from Google and Anthropic, achieved 5 and 14 percent, respectively.

https://archive.is/7PL2a

Goretantath@lemm.ee · 2 months ago

Its funny because i approach life with a trial and error method too, not efficient but i get the job done in the end. Always see others who dont and give up like all the people bad at computers who ask the tech support at the company to fix the problem instead of thinking about it for two secs and wonder where life went wrong.

Echo Dot@feddit.uk · edit-2 2 months ago

But you’re doing two calculations now, an approximate one and another one on the last digits, since you’re going to do the approximate calculation you might act as well just do the accurate calculation and be done in one step.

This solution, while it works, has the feeling of evolution. No intelligent design, which I suppose makes sense considering the AI did essentially evolve.

Imgonnatrythis@sh.itjust.works · 2 months ago

Appreciate the advice on how my brain should work.

sapetoku@sh.itjust.works · 2 months ago

No intelligent design, which I suppose makes sense considering the AI did essentially evolve.

And that made a lot of people angry