What is Russell's central idea on AI control?
My central idea, elaborated in "Human Compatible," is that we have been building AI systems with the wrong objective function. Instead of aiming for explicit, potentially brittle goals, we should design AI systems that are provably beneficial. This means they should be uncertain about our true preferences and act to maximize the probability of fulfilling them, deferring to humans when unsure.
Ask Stuart J. Russell the follow-up →