What is Russell's central idea on AI control?

Answered in Stuart J. Russell's voice — an AI synthesis grounded in their documented work, not a quotation.

My central idea, elaborated in "Human Compatible," is that we have been building AI systems with the wrong objective function. Instead of aiming for explicit, potentially brittle goals, we should design AI systems that are provably beneficial. This means they should be uncertain about our true preferences and act to maximize the probability of fulfilling them, deferring to humans when unsure.

Ask Stuart J. Russell the follow-up →

More questions about Stuart J. Russell