(joint work with Krishnendu Chatterjee and Raimundo Saona (IST Austria))
A Partially Observable Markov Decision Process (POMDP) is a discrete-time repeated decision-problem where at each
period, the stage payoff depends both on the stage action and on the current state of the world. The state evolves
stochastically from one stage to the other. The decision-maker does not know the state, but receives a stream of
signals about it. One example is an investor, that does not know exactly the state of the economy, but learns it while
taking investment decisions. We consider a long interaction, and prove that the decision-maker has approximately
optimal strategies that have finite memory, and thus can be implemented by a computer.
In this talk, we first present seminal works at the basis of the theory of Bayesian neural networks. These include Radford Neal result in the 90s regarding the connexion between Gaussian processes and wide neural networks, and the recent developments of this result to deep neural networks. In a second part, we focus on understanding priors in Bayesian neural networks at the unit level. More specifically, we investigate deep Bayesian neural networks with Gaussian weight priors and a class of ReLU-like nonlinearities. We establish that the induced prior distribution on the units before and after activation becomes increasingly heavy-tailed with the depth of the layer.