I often see a lot of people with outdated understanding of modern LLMs.

This is probably the best interpretability research to date, by the leading interpretability research team.

It’s worth a read if you want a peek behind the curtain on modern models.

  • tabarnaski@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    7
    ·
    7 months ago

    I think the most interesting thing in this article is the fact that some concepts central to semantics (analogy, connotation) or psychology (bias) kind of emerge naturally in multi layered neural networks of sufficient size. Also that it can sound like different personalities (overconfident, secretive, delusional) if you manipulate the weight or the proximity of features. I’d like to see the same kind of study but for midjourney…

    • GiveMemes@jlai.lu
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      7 months ago

      That’s a chicken and egg situation tho. Is the bias a result of a mind? Or is it the result of being trained on data with common human biases all put together by humans? Are these traits actually measurable or are we just anthropomorphizing a machine like we do everything else?

    • magic_lobster_party@kbin.run
      link
      fedilink
      arrow-up
      1
      ·
      7 months ago

      I would imagine a similar result. Like how the word “cartoon” activates one particular feature. And if you identify this feature you can control the level of “cartooniness” by tweaking the particular feature.