September 12, 2019 –
(joint work with Krishnendu Chatterjee and Raimundo Saona (IST Austria))
A Partially Observable Markov Decision Process (POMDP) is a discrete-time repeated decision-problem where at each
period, the stage payoff depends both on the stage action and on the current state of the world. The state evolves
stochastically from one stage to the other. The decision-maker does not know the state, but receives a stream of
signals about it. One example is an investor, that does not know exactly the state of the economy, but learns it while
taking investment decisions. We consider a long interaction, and prove that the decision-maker has approximately
optimal strategies that have finite memory, and thus can be implemented by a computer.