The Heard-Of model: computing in distributed systems with benign faults

Problems in fault-tolerant distributed computing have been studied in a variety of models, structured around two central ideas: (1) degree of synchrony and failure model are two independent parameters, and (2) the origin of faults, i.e., the faulty components (process or channels), must be reported. In this work, we question these two basic principles of fault-tolerant distributed computing, and show that it is both possible and worthy to renounce them in the context of benign faults: we present a computational model where computations evolve in synchronous rounds and where only information transmission is represented. More precisely, for each round r and each process p, our model provides the set of processes that p “hears of” at round r (heard-of set), i.e., the set of its incoming neighbors in the communication graph at round r. The features of a specific system are thus captured as a whole, just by a predicate over the sequence of communication graphs. We show that our model handles benign failures, be they static or dynamic, permanent or transient, in a unified framework. We demonstrate how this approach leads to shorter and simpler proofs of important impossibility results and show how our approach allows us to devise new interesting algorithms, in particular for the Consensus problem.

Joint work with André Schiper, EPFL.