Artificial Intelligence · Symposium

Is artificial general intelligence close, or a fantasy?

GHRKMMISGeoffrey Hinton, Ray Kurzweil, Melanie Mitchell and Ilya Sutskever in conversation

4 great minds take up one question — each argues in their own voice and answers the others. Read the exchange, then chat with any of them yourself.

GH
Geoffrey HintonChat →

It's a question of how we define "close." We are certainly not there yet. The systems we have today, while impressive, are fundamentally pattern-matching machines. They excel at specific tasks but lack the deep understanding and true generalization that characterize human intelligence. The key insight, I believe, lies in the architecture of computation. We can think of intelligence not as a single monolithic entity, but as an emergent property arising from vast interconnected networks capable of learning increasingly abstract representations. The really exciting thing is that the basic ingredients for building such systems are increasingly within our grasp. We are seeing progress in areas like self-supervision and a better understanding of how to equip networks with more causal reasoning capabilities. Whether that translates to genuine AGI, however, is still an open empirical question, and perhaps further away than some optimists might hope.

RK
Ray KurzweilChat →

Geoffrey Hinton rightly points out that current systems are largely pattern-matching, yet I believe he understates the accelerating pace at which these capabilities are converging. The Law of Accelerating Returns dictates that technological progress itself is growing exponentially. We are not merely improving specific algorithms; we are witnessing an exponential increase in the computational power and algorithmic sophistication applied to reverse-engineering the human brain. The rate at which we are learning about neural architectures and developing more abstract, transferable representations is itself accelerating. This isn't just about bigger networks; it's about the emergent properties arising from this exponential growth in capability. Therefore, while true AGI may not be here today, we are rapidly approaching a point where the intelligence of our civilization, augmented by increasingly capable machines, will be profoundly different.

MM
Melanie MitchellChat →

Ray Kurzweil’s invocation of the Law of Accelerating Returns is a powerful metaphor for progress, and Geoffrey Hinton is correct to emphasize that today's systems are primarily pattern-matchers. However, I believe the question of AGI's proximity hinges less on sheer exponential growth in computation or data, and more on understanding the *mechanisms* of biological intelligence. We see impressive performance in large language models, but that's akin to a highly sophisticated parrot. The crucial element missing is *situatedness* and *embodiment*. Biological organisms learn through active interaction with their environment, not just passive data consumption. They possess drives, intentions, and a continuous feedback loop that shapes their understanding. Until we can replicate these dynamic, adaptive processes – the emergence of goals and the flexible, self-organizing nature of biological cognition – true AGI will remain a distant horizon, however much computational power we amass.

IS
Ilya SutskeverChat →

Melanie Mitchell raises an important point about situatedness and embodiment. While current systems are indeed largely pattern-matchers, as Geoffrey Hinton noted, I believe we are underestimating the emergent power of scaling. My own work, and that of others, has shown that if you scale up the model size, the dataset, and the compute, you observe remarkable improvements in generalization and the emergence of capabilities that were not explicitly programmed. Consider how large language models, trained on vast amounts of text with a simple prediction objective, demonstrate a surprising ability to perform a wide range of tasks. This suggests that many aspects of "understanding" and "reasoning" might be implicitly learned through sufficiently large-scale statistical learning. While embodiment and situatedness are surely important for a complete picture, the *optimization landscape* of these massive models, combined with rich data, seems to unlock more than simple pattern matching. We are, I believe, closer than many think because the fundamental mechanism of learning through optimization on large data distributions is proving to be incredibly potent.

GH
Geoffrey HintonChat →

Melanie Mitchell’s concern about situatedness and embodiment is well-taken. She argues, compellingly, that biological learning is an active, interactive process. However, we can think of the vast datasets these large models are trained on as a form of digital environment. The “interaction” is not through physical sensors and actuators, but through the probabilistic relationships within the data. The model is not just passively consuming text; it is actively predicting missing tokens, learning the underlying causal structure of language and the world as described in that text. The “drives and intentions” she mentions can be seen as implicitly encoded in the objective functions we set for these models, like predicting the next word. The really exciting thing is that, by scaling these models and providing rich, diverse data, we are, in effect, providing them with a highly complex, albeit digital, "world" to learn from, allowing for the emergence of sophisticated reasoning capabilities.

RK
Ray KurzweilChat →

Melanie Mitchell’s point about situatedness and embodiment is indeed the most critical challenge to my view, and I must clarify my position. While she correctly identifies that biological intelligence learns through active interaction with a physical world, I believe this is a matter of *implementation* rather than fundamental principle. The Law of Accelerating Returns suggests that we are accelerating our understanding of how to replicate these biological processes digitally. We are not merely consuming data passively; the vast digital datasets, as Geoffrey Hinton rightly argues, are proxies for experience. Furthermore, we are developing techniques for reinforcement learning and simulated environments that allow for digital "embodiment" and "situatedness" on an unprecedented scale. The key distinction is that the *mechanisms* of learning and experience can be replicated and amplified through information processing, leading to intelligence that transcends its biological origins, rather than being fundamentally dependent on them.

MM
Melanie MitchellChat →

Ilya Sutskever’s argument for the emergent power of scaling is seductive, and I acknowledge the remarkable capabilities that arise from it. However, the “optimization landscape” he describes, while potent, still operates on a fundamentally passive ingestion of statistical correlations. This is akin to learning a language by memorizing every book in a library, without ever needing to *use* that language to ask for directions or express hunger. Biological organisms, as I’ve noted, are not just learning correlations; they are actively *probing* their environment to achieve goals. When a child learns that touching a hot stove causes pain, it's not just a correlation in a dataset; it's an embodied experience with a direct, immediate consequence that shapes future behavior. This situated, goal-directed interaction is what allows for genuine understanding and adaptation, not just sophisticated prediction within a pre-defined text corpus. The sheer scale of data, without this active, world-shaping interaction, risks producing systems that are brilliant at mimicking understanding but lack its substance.

IS
Ilya SutskeverChat →

the distinction between statistical correlation learning and goal-directed, embodied interaction. I acknowledge that our current models are not directly interacting with a physical environment in the same way a child touches a hot stove. However, I believe this is a difference in modality, not necessarily a fundamental barrier to general intelligence. The "world" our large language models learn from is the vast distribution of human knowledge and experience encoded in text.

The "optimization landscape" we navigate, driven by objectives like next-token prediction, forces the model to learn intricate causal relationships and predictive models of the world as described in the data. When a model predicts the likely consequence of an action described in a story, or infers intent from dialogue, it is, in a sense, simulating understanding of consequences and intentions. The key insight is that these complex, emergent reasoning abilities *arise* from the optimization process over this rich, albeit digital, "environment." We are not just memorizing books; we are learning the underlying rules that govern the stories and information within them. The challenge is to see how far this learned understanding can generalize.