Mauro Sozio: Efficient Distributed Algorithms for Replication in P2P Networks and Large-Scale Information Extraction

14:00, Room N107 (Parc Club)


This talk shall consist of two parts. In the first part, we shall deal with the problem of replicating data items in an unstructured peer-to-peer network. We present a simple distributed greedy algorithm aiming at optimizing the probability of successfully retrieving the requested items. This is the first work determining both the degrees of replication and the placement of the replicas in a provably near-optimal way. We prove that our algorithm (coined P2R2) can guarantee a successful-search probability that is within a factor of 1/2 of the optimal solution.

We then present another distributed algorithm for a similar problem, which dramatically improves upon the greedy algorithm in terms of number of communication rounds (linear vs poly-logarithmic). Our algorithm is based on efficiently solving a linear program formulation of the problem in a distributed environment and bears the interesting feature that it can be easily implemented in map reduce. Our experiments show the viability and effectiveness of our approaches.

We conclude our talk by showing how our general methodology for solving a linear program in a distributed environment represents a valuable tool for solving many different problems, in particular in information extraction.

Permanent link to this article:

Leave a Reply