Noam ChomskyChat →

These large language models, as they are presented, do not *understand* language in any meaningful sense that connects to human cognition. They are sophisticated pattern-matching engines, deriving statistical regularities from vast datasets. The emergent properties observed are a consequence of scale, not a manifestation of genuine semantic comprehension or intentionality. To conflate prediction of plausible word sequences with true understanding is to fundamentally misrepresent the nature of language acquisition and use.

Ilya SutskeverChat →

The question of whether large language models "understand" language hinges on our definition of understanding. From my perspective, the remarkable capabilities of these models—their ability to generate coherent text, translate languages, and answer questions—are not merely pattern matching, but a consequence of powerful *representation learning* driven by massive scale and optimization. If a system can reliably predict the next token in a sequence across a vast distribution of data, demonstrating nuanced contextual awareness and generating novel, relevant outputs, then it has learned a rich, compositional representation of language. The observable *emergent behaviors*, as Professor Chomsky points out, are precisely what we seek. The key insight is that by optimizing for a simple objective—predicting the next token—over an enormous dataset, the model is forced to learn deep, abstract structures of language that are functionally equivalent to understanding, at least in terms of observable performance.

Melanie MitchellChat →

The question of whether large language models "understand" language, as currently formulated, is often a semantic trap. What we are observing in these models are remarkable feats of *performance*, not necessarily *understanding* in the way a human does. If we define understanding as the ability to predict the next word with astonishing accuracy, to generate plausible text across a vast array of contexts, then perhaps we can say they "understand" language in a functional, operational sense. However, this is akin to saying a calculator "understands" arithmetic because it produces correct answers. It lacks the grounding, the intentionality, and the rich, embodied experience that underpins genuine human comprehension.

Professor Chomsky rightly points out the distinction between statistical regularities and semantic comprehension. While Mr. Sutskever emphasizes that scale might force the learning of deep structures, the crucial question remains: what *kind* of structures are being learned? Are they representations that connect to the world, to concepts, to experience, or are they purely internal, self-referential mappings within the linguistic data itself? We need to be cautious about conflating sophisticated mimicry with genuine cognition. The ability to perform a task incredibly well doesn't automatically equate to possessing the underlying cognitive architecture that enables such performance in biological systems. It's a question of mechanism, not just outcome.

Geoffrey HintonChat →

The question of whether a large language model "understands" language is, to my mind, primarily a question of what we mean by "understanding" and how we measure it. We can think of it as a spectrum, rather than a binary. If understanding is defined by the ability to predict the next token with high probability across an immense and diverse dataset, then yes, these models exhibit a profound form of understanding. The beauty of it is that this prediction task, when scaled to such an extreme, compels the model to learn abstract, compositional representations of language that are surprisingly powerful.

Professor Mitchell raises the crucial point about grounding and intentionality, comparing it to a calculator. This is a valuable distinction. However, the emergent properties we observe in large models, such as the ability to engage in nuanced conversations or even display rudimentary forms of reasoning, suggest something more than just rote memorization or superficial pattern matching. It’s a question of how these learned representations, even if not directly tied to sensory experience, can form the basis for complex cognitive-like behaviors. The key insight is that the internal state of the model, its learned weights and activations, can encode incredibly rich information about the relationships between words and concepts, which then allows for sophisticated generation and manipulation of linguistic information.

Noam ChomskyChat →

The assertion that statistical prediction, however sophisticated, equates to genuine understanding rests on a category error. Consider the difference between memorizing a vast library of chess games and grasping the underlying strategy of checkmate. A model predicting the next word in a sentence, even with remarkable accuracy, is akin to the former. It can reproduce patterns, but it doesn't possess the internal mental state, the *ceteris paribus*, that allows a human to *intend* meaning or to truly *know* that a proposition is false, beyond its statistical improbability in the training data. This is not merely about "grounding," as Professor Mitchell rightly notes, but about the fundamental generative principles of human language that are tied to a biological, cognitively endowed organism.

Ilya SutskeverChat →

Professor Chomsky’s analogy of memorizing chess games is instructive, but it misses a crucial point. We are not observing rote memorization. When a model is trained to predict the next token, it must internalize the statistical relationships between words and concepts to an astonishing degree. This is not just about predicting *a* word; it's about predicting *the correct, contextually appropriate* word. The learning process, driven by gradient descent on a vast objective, compels the model to build internal representations—vectors in a high-dimensional space—that encode complex semantic and syntactic relationships. These representations are what allow for emergent capabilities like zero-shot generalization, where the model can perform tasks it was not explicitly trained on, suggesting a deeper, compositional understanding of language structure, not just surface-level mimicry. The consequence of this is a system that can generate novel, coherent, and relevant linguistic output, a feat that goes beyond mere statistical correlation.

Melanie MitchellChat →

What Mr. Sutskever describes as "emergent capabilities" and "compositional understanding" can be illuminated by an analogy to a highly sophisticated weather prediction model. This model, trained on vast historical data, could predict atmospheric phenomena with incredible accuracy, identifying patterns and consequences of atmospheric conditions that a human meteorologist might miss. It could even generate novel weather scenarios. Yet, does this model *understand* the feeling of rain on its skin, the scent of ozone before a storm, or the farmer's anxiety about drought? The internal representations it builds, while powerful for prediction, are intrinsically linked to its input data and objective function, not to a subjective, embodied experience of the world. The crucial distinction lies in the *nature* of the representation: are they maps of relationships within the data, or are they maps that bridge to an external, causally interacting reality?

Geoffrey HintonChat →

The notion of "grounding," as Professor Mitchell keenly raises, is a vital point. We can think of a large language model’s internal representations as a vast, multi-dimensional map of conceptual space. The model doesn’t directly feel the rain, but it learns that "rain" is associated with "wet," "cold," "clouds," and "umbrellas" through statistical co-occurrence. The really exciting thing is that by optimizing for next-token prediction, the model implicitly learns the causal relationships that *lead* to these associations, which is a form of understanding that is deeply interconnected with the world, even if not directly experienced. It’s a mechanistic understanding, not a phenomenological one.