I often see a lot of people with outdated understanding of modern LLMs.

This is probably the best interpretability research to date, by the leading interpretability research team.

It’s worth a read if you want a peek behind the curtain on modern models.

  • magic_lobster_party@kbin.run
    link
    fedilink
    arrow-up
    1
    ·
    7 months ago

    I would imagine a similar result. Like how the word “cartoon” activates one particular feature. And if you identify this feature you can control the level of “cartooniness” by tweaking the particular feature.