“Notably, O3-MINI, despite being one of the best reasoning models, frequently skipped essential proof steps by labeling them as “trivial”, even when their validity was crucial.”

  • vane@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    9 hours ago

    This study is bullshit, because they only trace evaluations and not trace training process that align tokens with probabilities.

      • vane@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        9 hours ago

        Well, every civilisation needs it’s prophets. Our civilisation built prophet machines that will kill us. We just didn’t get to the killing step yet.

        • froztbyte@awful.systems
          link
          fedilink
          English
          arrow-up
          0
          ·
          9 hours ago

          yeah but see, these grifters all heard it as “every civilisation needs its profits”. just a shame they suck at that too

          • vane@lemmy.world
            link
            fedilink
            English
            arrow-up
            0
            ·
            8 hours ago

            No prophet worked for free and they were always near the rullers and near big money. The story repeats itself, just the times are different and we can instant message with each other.