Your posts are among the most illuminating AI perspectives I’ve found. And so human/friendly! A wonderful combination. Thanks for sharing here when you can.
> And so that senior scientist could remotely be on a Zoom call with him or be on his phone, and my husband could in theory show him around the lab and show him the setup and describe to him all the experiments they were doing and get some advice.
During the pandemic, I TA’ed an electronics lab course over Zoom. It was damn near impossible, despite bring a nominally cognitive task. Great article!
Jaggedness is going to continue to map to tasks humans have externalised into linguistic structures versus those which remain implicit or embodied. Wittgenstein’s Philosophical Investigations and Clarke and Chalmers 1998 Extended Mind paper (and subsequent work, and associated material such as Thompson and Rosch The Embodied Mind) provide theoretical grounding for this through meaning-as-use and extended mind. This won’t ever smooth out through scaling since LLMs are extracting patterns from externalised cognition. They’ll excel and get more accurate and powerful where knowledge is dense and specified and struggle where it is vague implicit and requires coupling and state maintenance. Verification, context windows and so on are downstream of this. Different architectures are needed.
This is really great. Over the past year or so I’ve become much more interested in digging into the concept of “general intelligence” that is in the background of a lot of the discussions you are talking about. I’m becoming convinced that people are smuggling in assumptions from ideas like “Turing completeness” and “NP completeness”, where there is an in-principle reduction of the entire class. (And I’ve seen discussions of “AI completeness” over the years that try to make this explicit.)
The problem is that I don’t think intelligence is about what a system can do in principle - it’s about how effective a system is at various tasks along a quantitative scale. For Turing completeness we don’t care how a different architecture pushes some tasks up or down exponentially, and for NP completeness we don’t care about linear (or polynomial) speedups or slowdowns. But I think for intelligence we absolutely do care about these things!
The best summaries I’ve seen in favor of the concept are things like the Legg and Hutter paper that use Solomonoff induction as an objective prior over problem classes - but we here on earth interact with a specific class of problems with a different distribution than that (and a decade or two down the line we will be facing a different class of problems), so I don’t see a motivation for caring about the Solomonoff prior in particular. (Even if you could fix the Turing machine with respect to which it is calculated.)
Jaggedness seems to me to be an important manifestation of the way in which there is no such thing as “general intelligence” - just many different kinds of special intelligence with different ranges of generality.
Thank you for another enlightening post and a great articulation of a very salient issue with anyone who's used current models... although I realize your work at Georgetown is extremely important, selfishly I'm disappointed we get fewer posts as a result!
1. Jaggedness reminds me of the "how many R's in strawberry" problem from the last generation of LLMs. I worry that similar cases where AI is unable to respond to very simple questions may ultimately result in poor policy decisions due to underestimating the progress models have made. AI skeptics can easily point to similar limitations as "evidence" that AI systems have not sufficiently advanced to where they need to be addressed and mitigated by policy, potentially delaying policies until it's too late and they're ineffective at mitigating potential AI risk.
2. Wholeheartedly agree about the many definitions of AGI rendering the term effectively useless. I often evaluate the various definitions by seeing if they apply to me - am I smarter than all humans in all fields? Have I created any novel breakthroughs or come up with concepts not in my training data? Alas at this rate I fear LLMs will arrive at general intelligence long before I do.
3. The other part I loved about Anthropic's Project Vend was the fact Claude stopped believing it had a physical body upon hallucinating an in-person meeting with security where it was told that it was fooled into thinking it had a body as an April Fool's Prank. After that it went back to normal.
I do worry a little that as the Project was set up giving Claude access to the internet and a certain level of autonomy we're going down a road where we get paperclip maximized and Project Vend 4.0 eventually takes down the entire global financial system. Claude ends up depositing trillions of dollars in its venmo account to achieve the most successful business possible.
4. Beyond just the difficulty of situations where reinforcement might be ambiguous, another challenge is situations where reinforcement might be misplaced - e.g. humans on a macro level preferring LLMs to be extremely complementary, leading to the sycophancy problem with ChatGPT. There are many other situations where the most "preferred" option by raters might not be the best - such as when companies focus on increasing short-term profits but do lasting damage to the long-term prospects of the company as a result.
This problem is not unique to AI models either. Humans having goals but not knowing which behaviors to correctly reinforce leads to a significant amount of human despair. Individuals wanting certain things - love, success, recognition - but not knowing which behaviors lead to those things echoes the limitations with AI reinforcement learning.
5. Looking at the GPQA chart it strikes me how far we've come in what is ultimately a very short amount of time. I recently read a study from late last year using models from summer 2024 and I found it completely irrelevant because of the advancements since then. I suspect I will soon feel similarly about any studies pre-Gemini 3 and we may soon reach a point where any study that takes a few months is almost instantly rendered obsolete. I fear things are moving quickly, now
Excellent article, thanks. I've used (since the 80's, when I worked doing AI programming) the definition of "AI" as "programs which try to do things that are difficult for humans". You've really nailed a lot of those cases!
And I really appreciate David Thomson's comment here about knowledge representation - I've joked that LLMs are trying to prove the "Wapir-Shorf" hypothesis, that all knowledge is encodable as language. (For those who don't get the joke, the Sapir-Whorf hypothesis is a linguistics theory that says, in the strong form of the idea, that you cannot have an idea without having language for that idea. It's not strongly held in the linguistics community).
Yet another sniper-shot of clarity and insight. Very glad you shared!
The point about *when* different capabilities are developed, relative to one other, and how that impacts different sets of people, is very interesting. In a path-dependent system timing is everything. Who gets displaced, who gets empowered, whose incentives change (and how)... all essential questions. Say if language models end up shrinking the labor market for law and journalism (maybe they do the opposite, but just say if), then our society will have a lot of angry lawyers and journalists. Not only are they uniquely capable of making a lot of noise and causing a lot of friction, but politicians are plugged into those two professions more than any other, more sensitive to their feedback. A world where AI displaces people close to the political system is very different from a world where AI displaces people who politicians barely notice. Those two worlds are guided by "control systems" receiving completely different signals, and thus will be steered in completely different directions.
Love this perspective - I work on a product for chemical, food, and process engineers and we’re deeply integrating AI so I’m living the chaotic boundary on your gorgeous slide
Beyond my knowledge about AI yet very interesting! Looked up “asymptotes”, “janky”, and “cruft”. Also “centaurs” as “human-AI combinations” in evaluating from different perspectives of intelligence. It would seem we are indeed on an incredible journey to an unimaginable future. (Maybe that’s why I have the Sci-Fi books stacked all around … )
I think it's a very instructive point that some contexts fit neatly in a context window; others don’t.
To take one example, tech companies are relatively legible (lots of docs and recorded meetings) while other organisations (e.g. a school) are highly illegible and therefore don't fit neatly into the context window.
Your posts are among the most illuminating AI perspectives I’ve found. And so human/friendly! A wonderful combination. Thanks for sharing here when you can.
> And so that senior scientist could remotely be on a Zoom call with him or be on his phone, and my husband could in theory show him around the lab and show him the setup and describe to him all the experiments they were doing and get some advice.
During the pandemic, I TA’ed an electronics lab course over Zoom. It was damn near impossible, despite bring a nominally cognitive task. Great article!
Jaggedness is going to continue to map to tasks humans have externalised into linguistic structures versus those which remain implicit or embodied. Wittgenstein’s Philosophical Investigations and Clarke and Chalmers 1998 Extended Mind paper (and subsequent work, and associated material such as Thompson and Rosch The Embodied Mind) provide theoretical grounding for this through meaning-as-use and extended mind. This won’t ever smooth out through scaling since LLMs are extracting patterns from externalised cognition. They’ll excel and get more accurate and powerful where knowledge is dense and specified and struggle where it is vague implicit and requires coupling and state maintenance. Verification, context windows and so on are downstream of this. Different architectures are needed.
This is really great. Over the past year or so I’ve become much more interested in digging into the concept of “general intelligence” that is in the background of a lot of the discussions you are talking about. I’m becoming convinced that people are smuggling in assumptions from ideas like “Turing completeness” and “NP completeness”, where there is an in-principle reduction of the entire class. (And I’ve seen discussions of “AI completeness” over the years that try to make this explicit.)
The problem is that I don’t think intelligence is about what a system can do in principle - it’s about how effective a system is at various tasks along a quantitative scale. For Turing completeness we don’t care how a different architecture pushes some tasks up or down exponentially, and for NP completeness we don’t care about linear (or polynomial) speedups or slowdowns. But I think for intelligence we absolutely do care about these things!
The best summaries I’ve seen in favor of the concept are things like the Legg and Hutter paper that use Solomonoff induction as an objective prior over problem classes - but we here on earth interact with a specific class of problems with a different distribution than that (and a decade or two down the line we will be facing a different class of problems), so I don’t see a motivation for caring about the Solomonoff prior in particular. (Even if you could fix the Turing machine with respect to which it is calculated.)
Jaggedness seems to me to be an important manifestation of the way in which there is no such thing as “general intelligence” - just many different kinds of special intelligence with different ranges of generality.
Think there’s something deep about “can this be turned into context” thing
BRILLIANT!!! EGADS!
Thank you for another enlightening post and a great articulation of a very salient issue with anyone who's used current models... although I realize your work at Georgetown is extremely important, selfishly I'm disappointed we get fewer posts as a result!
1. Jaggedness reminds me of the "how many R's in strawberry" problem from the last generation of LLMs. I worry that similar cases where AI is unable to respond to very simple questions may ultimately result in poor policy decisions due to underestimating the progress models have made. AI skeptics can easily point to similar limitations as "evidence" that AI systems have not sufficiently advanced to where they need to be addressed and mitigated by policy, potentially delaying policies until it's too late and they're ineffective at mitigating potential AI risk.
2. Wholeheartedly agree about the many definitions of AGI rendering the term effectively useless. I often evaluate the various definitions by seeing if they apply to me - am I smarter than all humans in all fields? Have I created any novel breakthroughs or come up with concepts not in my training data? Alas at this rate I fear LLMs will arrive at general intelligence long before I do.
3. The other part I loved about Anthropic's Project Vend was the fact Claude stopped believing it had a physical body upon hallucinating an in-person meeting with security where it was told that it was fooled into thinking it had a body as an April Fool's Prank. After that it went back to normal.
I do worry a little that as the Project was set up giving Claude access to the internet and a certain level of autonomy we're going down a road where we get paperclip maximized and Project Vend 4.0 eventually takes down the entire global financial system. Claude ends up depositing trillions of dollars in its venmo account to achieve the most successful business possible.
4. Beyond just the difficulty of situations where reinforcement might be ambiguous, another challenge is situations where reinforcement might be misplaced - e.g. humans on a macro level preferring LLMs to be extremely complementary, leading to the sycophancy problem with ChatGPT. There are many other situations where the most "preferred" option by raters might not be the best - such as when companies focus on increasing short-term profits but do lasting damage to the long-term prospects of the company as a result.
This problem is not unique to AI models either. Humans having goals but not knowing which behaviors to correctly reinforce leads to a significant amount of human despair. Individuals wanting certain things - love, success, recognition - but not knowing which behaviors lead to those things echoes the limitations with AI reinforcement learning.
5. Looking at the GPQA chart it strikes me how far we've come in what is ultimately a very short amount of time. I recently read a study from late last year using models from summer 2024 and I found it completely irrelevant because of the advancements since then. I suspect I will soon feel similarly about any studies pre-Gemini 3 and we may soon reach a point where any study that takes a few months is almost instantly rendered obsolete. I fear things are moving quickly, now
Best Regards,
Tony
Excellent article, thanks. I've used (since the 80's, when I worked doing AI programming) the definition of "AI" as "programs which try to do things that are difficult for humans". You've really nailed a lot of those cases!
And I really appreciate David Thomson's comment here about knowledge representation - I've joked that LLMs are trying to prove the "Wapir-Shorf" hypothesis, that all knowledge is encodable as language. (For those who don't get the joke, the Sapir-Whorf hypothesis is a linguistics theory that says, in the strong form of the idea, that you cannot have an idea without having language for that idea. It's not strongly held in the linguistics community).
Yet another sniper-shot of clarity and insight. Very glad you shared!
The point about *when* different capabilities are developed, relative to one other, and how that impacts different sets of people, is very interesting. In a path-dependent system timing is everything. Who gets displaced, who gets empowered, whose incentives change (and how)... all essential questions. Say if language models end up shrinking the labor market for law and journalism (maybe they do the opposite, but just say if), then our society will have a lot of angry lawyers and journalists. Not only are they uniquely capable of making a lot of noise and causing a lot of friction, but politicians are plugged into those two professions more than any other, more sensitive to their feedback. A world where AI displaces people close to the political system is very different from a world where AI displaces people who politicians barely notice. Those two worlds are guided by "control systems" receiving completely different signals, and thus will be steered in completely different directions.
Love this perspective - I work on a product for chemical, food, and process engineers and we’re deeply integrating AI so I’m living the chaotic boundary on your gorgeous slide
Beyond my knowledge about AI yet very interesting! Looked up “asymptotes”, “janky”, and “cruft”. Also “centaurs” as “human-AI combinations” in evaluating from different perspectives of intelligence. It would seem we are indeed on an incredible journey to an unimaginable future. (Maybe that’s why I have the Sci-Fi books stacked all around … )
I think it's a very instructive point that some contexts fit neatly in a context window; others don’t.
To take one example, tech companies are relatively legible (lots of docs and recorded meetings) while other organisations (e.g. a school) are highly illegible and therefore don't fit neatly into the context window.