- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
You might know Robert Miles from his appearances in Computerphile. When it comes to AI safety, his videos are the best explainers out there. In this video, he talks about the developments of the past year (since his last video) and how AI safety plays into it.
For example, he shows how GPT 4 shows understanding of “theory of other minds” where GPT 3.5 did not. This is where the AI can keep track of what other people know and don’t know. He explains the Sally-Anne test used to show this.
He covers an experiment where GPT-4 used TaskRabbit to get a human to complete a CAPTCHA, and when the human questioned whether it was actually a robot, GPT-4 decided to lie and said that it needs help because it’s blind.
He talks about how many researchers, including high-profile ones, are trying to slow down or stop the development of AI models until the safety research can catch up and ensure that the risks associated with it are mitigated.
And he talks about how suddenly what he’s been doing became really important, where before it was mostly a fun and interesting hobby. He now has an influential role in how this plays out and he talks about how scary that is.
If you’re interested at all in this topic, I can’t recommend this video enough.
I’ve followed Robert Miles’ YouTube channel for years and watched his old numberphile videos before that. He’s a great communicator and a genuinely thoughtful guy. I think he’s overly keen on anthropomorphising what AI is doing, partly because it makes it easier to communicate, but also because I think it suits the field of research he’s dedicated himself to. In this particular video, he ascribes a “theory of mind” based on the LLM’s response to a traditional and well-known theory of mind test. The test is included in the training data, and ChatGPT3.5 successfully recognises it and responds correctly. However, when the details of the test (i.e. specific names, items, etc.) are changed, but the form of the problem is the same, ChatGPT3.5 fails. ChatGPT 4, however, still succeeds – which Miles concludes means that ChatGPT 4 has a stronger theory of mind.
My view is that this is obviously wrong. I mean, just prima facie absurd. ChatGPT3.5 correctly recognises the problem as a classic psychology question, and responds with the standard psychology answer. Miles says that the test is found in the training data. So it’s in ChatGPT4’s training data, too. And ChatGPT 4’s LLM is good enough that, even if you change the nouns used in the problem, it is still able to recognise that the problem is the same one found in its training data. That does not in any way prove it has a theory of mind! It just proves that the problem is in its training set! If 3.5 doesn’t have a theory of mind because a small change can break the link between training set and test set, how can 4.0 have a theory of mind, if 4.0 is doing the same thing that 3.5 is doing, just with the link intact?
The most obvious problem is that the theory of mind test is designed for determining whether children have developed a theory of mind yet. That is, they test whether the development of the human brain has reached a stage that is common among other human brains, in which they can correctly understand that other people may have different internal mental states. We know that humans are, generally, capable of doing this, that this understanding is developed during childhood years, and that some children develop it sooner than others. So we have devised a test to distinguish between those children who have developed this capability and those children who have not yet.
It would be absurd to apply the same test to anything other than a human child. It would be like giving the LLM the “mirror test” for animal self-awareness. Clearly, since the LLM cannot recognise itself in a mirror, it is not self-aware. Is that a reasonable conclusion too? I won’t go too hard on this, because it’s a small part of a much wider point, and I’m sure if you pushed him on this, he would agree that LLMs don’t actually have a theory of mind, they merely regurgitate the answer correctly (many animals can be similarly trained to pass theory of mind tests by rewarding them for pecking/tapping/barking etc at the right answer).
Indeed, Miles’ substantial point is that the “overton window” for AI Safety has shifted, bringing it into the mainstream of tech and political discourse. To that extent, it doesn’t matter whether ChatGPT has consciousness or not, or a theory of mind, as long as enough people in mainstream tech and political discourse believe it does for it to warrant greater attention on AI Safety. Miles further believes that AI Safety is important in its own right, so perhaps he doesn’t mind whether or not the overton window has shifted on the basis of AI’s true capability or its imagined capability. He hints at, but doesn’t really explore, the ulterior motives for large tech companies to suggest that the tools they are developing are so powerful that they might destroy the world. (He doesn’t even say it as explicitly as I did just then, which I think is a failing.) But maybe that’s ok for him, as long as AI Safety research is being taken seriously.
I disagree. It would be better to base policy on things that are true, and if you have to believe that LLMs have a theory of mind in order to gain mainstream attention on AI Safety, then I think this will lead us to bad policymaking. It will miss the real harms that AI pose – facial recognition used to bar people from shops that have a disproportionately high error rate for black people, resumé scanners and other hiring tools that, again, disproportionately discriminate against black people and other minorities, non-consensual AI porn, etc etc. We may well need policies to regulate this stuff, but focus on hypothetical existential risk of AGI in the future, over the very real and present harms that AI is doing right now, is misguided and dangerous.
If policymakers actually understood the tech and the risks even to the extent that Miles’s YouTube viewers did, maybe they’d come to the same conclusion that he does about the risk of AGI, and would be able to balance the imperative to act against all of the other things that the government should be prioritising. But, call me a sceptic, but I do not believe that politicians actually get any of this at all, and they just like being on stage with Elon Musk…