SOLUTION: maximize the expected total reward by choosing an optimal policy.

The name “Markov processes” first historically appeared as a result of a misspelled name “Mark-Off processes” that was previously used for random processes that describe learning in certain types of video games, but has become a standard terminology since then. The goal of (risk-neutral) reinforcement learning is to maximize the expected total reward by choosing an optimal policy. The goal of (risk-neutral) reinforcement learning is to neutralize risk, i.e. make the variance of the total reward equal zero. The goal of risk-sensitive reinforcement learning is to teach a RL agent to pick action policies that are most prone to risk of failure. Risk-sensitive RL is used, e.g. by venture capitalists and other sponsors of RL research, as a tool to assess the feasibility of new RL projects.

Chapter 2 Probabilistic Modeling