Reinforcement learning (RL) presents a promising approach to solving highly non-linear control tasks that are often challenging for classical control methods. In this talk, we present a learner-verifier framework for solving control tasks in stochastic systems with formal guarantees on reachability, safety or reach-avoidance specifications. Given a specification and a lower bound on the probability with which the specification should be satisfied, our framework jointly learns a control policy and a formal certificate which proves specification satisfaction with the desired probability. Both the control policy and the formal certificate are parametrized as neural networks. We use martingale certificates to formally verify probabilistic reachability, safety or reach-avoidance.