Same, I just tried deepseek-R1 on a question I invented as an AI benchmark. (No AI has been able to remotely correctly answer this simple question, though I won’t reveal what the question is here obviously.) Anyway, R1 was constantly making wrong assumptions, but also constantly second-guessing itself.
I actually do think the “reasoning” approach has potential though. If LLMs can only come up with right answers half the time, then “reasoning” allows multiple attempts at a right answer. Still, results are unimpressive.
Same, I just tried deepseek-R1 on a question I invented as an AI benchmark. (No AI has been able to remotely correctly answer this simple question, though I won’t reveal what the question is here obviously.) Anyway, R1 was constantly making wrong assumptions, but also constantly second-guessing itself.
I actually do think the “reasoning” approach has potential though. If LLMs can only come up with right answers half the time, then “reasoning” allows multiple attempts at a right answer. Still, results are unimpressive.