Is Reinforcement Learning only about rewards?
While rewards are central to the learning signal in RL, it's not solely about maximizing immediate gratification. The critical aspect is maximizing *cumulative future* reward. This means an agent might forgo a small immediate reward if it leads to a much larger reward down the line. The focus is on long-term consequences and optimal control over sequences of actions, not just single-step gains. The reward signal guides this long-term optimization.
Ask Richard S. Sutton the follow-up →