I’ve been saying this for about a year since seeing the Othello GPT research, but it’s nice to see more minds changing as the research builds up.

Edit: Because people aren’t actually reading and just commenting based on the headline, a relevant part of the article:

New research may have intimations of an answer. A theory developed by Sanjeev Arora of Princeton University and Anirudh Goyal, a research scientist at Google DeepMind, suggests that the largest of today’s LLMs are not stochastic parrots. The authors argue that as these models get bigger and are trained on more data, they improve on individual language-related abilities and also develop new ones by combining skills in a manner that hints at understanding — combinations that were unlikely to exist in the training data.

This theoretical approach, which provides a mathematically provable argument for how and why an LLM can develop so many abilities, has convinced experts like Hinton, and others. And when Arora and his team tested some of its predictions, they found that these models behaved almost exactly as expected. From all accounts, they’ve made a strong case that the largest LLMs are not just parroting what they’ve seen before.

“[They] cannot be just mimicking what has been seen in the training data,” said Sébastien Bubeck, a mathematician and computer scientist at Microsoft Research who was not part of the work. “That’s the basic insight.”

  • Redacted@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    11 months ago

    I question the value of this type of research altogether which is why I stopped following it as closely as yourself. I generally see them as an exercise in assigning labels to subsets of a complex system. However, I do see how the COT paper adds some value in designing more advanced LLMs.

    You keep quoting research ad-verbum as if it’s gospel so miss my point (and forms part of the apeal to authority I mentioned previously). It is entirely expected that neural networks would form connections outside of the training data (emergent capabilities). How else would they be of use? This article dresses up the research as some kind of groundbreaking discovery, which is what people take issue with.

    If this article was entitled “Researchers find patterns in neural networks that might help make more effective ones” no one would have a problem with it, but also it would not be newsworthy.

    I posit that Category Theory offers an explanation for these phenomena without having to delve into poorly defined terms like “understanding”, “skills”, “emergence” or Monty Python’s Dead Parrot. I do so with no hot research topics at all or papers to hide behind, just decades old mathematics. Do you have an opinion on that?

    • kromem@lemmy.worldOP
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      11 months ago

      You keep quoting research ad-verbum as if it’s gospel

      No, but I have learned over the years that when you see multiple papers discovering similar things at odds with the held consensus and see some even independently replicated that there’s usually more than just smoke.

      If this article was entitled “Researchers find patterns in neural networks that might help make more effective ones” no one would have a problem with it, but also it would not be newsworthy.

      The paper was titled “Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models.” Quanta, while a Pulizer winner in 2022 for explanatory reporting, is after all a publisher not a research institution. Though I dispute your issues with the headline as it’s in line with similar article headlines such as “Bees understand the concept of zero”.

      I posit that Category Theory… Do you have an opinion on that?

      You wouldn’t be the only person looking at it through that lens. It was more popular a few years ago I think, and hasn’t really caught on for LLMs vs other ML approaches and here it strikes me a bit like those with hammers looking for nails - the degree to which there’s functional overlaps in network introspection such as the linked Anthropic work suggests to me that the internalized delineations are a bit fuzzier than would cleanly map onto a category theory view - but it’s possible that as time goes on that it gets some research wins assuming it can come up with testable predictions that are successful. But it’s more of a ‘how’ than a ‘what’ question - whether a network understands abstract concepts tangental to language it is trained on and develops world models (an idea that would have been laughed out of the room just three years ago by any serious researchers despite your impression) using something that can be explained through category theory or through another interpretation, the result is arguably the more important finding than the interpretation of the means.

      It seems like you may be more committed to arguing the semantics and nuances of the tree in front of you than discussing the forest - that’s fine, it’s just not that interesting to me in turn.

      • Redacted@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        11 months ago

        To hijack your analogy its more akin to me stating a tree is a plant and you saying “So are these” pointing at a forest of plastic Christmas trees.

        I’m pretty curious why you imagine you have so many downvotes?

        • kromem@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          0
          arrow-down
          1
          ·
          11 months ago

          Because laypeople are very committed to a certain perspective of LLMs right now.

          You should see the downvotes I got a year or two ago explaining immunology research to antivaxxers.

          • Redacted@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            11 months ago

            Have you ever considered you might be the laypeople?

            Equating a debate about the origin of understanding to antivaxxers…

            You argue like a Trump supporter.