Talk by Amelie Zhou on Jun 15, 2017

Amelie Zhou, postdoc in the Ascola research team will talk about On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters.

Graph partitioning, which distributes graph processing workloads to multiple machines for better parallelism, is an important problem for optimizing the performance and communication cost of graph processing jobs. Recently, many graph applications such as social networks store their data on geo-distributed datacenters (DCs) to ensure flexible and low-latency services. This raises new challenges to existing graph partitioning methods, due to the costly Wide Area Network (WAN) bandwidths and the heterogeneous network bandwidths in the geo-distributed DCs. In this paper, we propose a heterogeneity-aware graph partitioning method named G-Cut, which aims at minimizing the runtime of graph processing jobs in geo-distributed DCs while satisfying the WAN usage budget. G-Cut is a two-stage graph partitioning method. In the traffic-aware graph partitioning stage, we adopt the one-pass edge
assignment to place edges into different partitions while minimizing the inter-DC data traffic size. In the network-aware partition refinement stage, we map the partitions obtained in the first stage onto different
DCs in order to minimize the inter-DC data transfer time. We evaluate the effectiveness and efficiency of G-Cut using real-world graphs. The evaluation results show that G-Cut is able to obtain both lower data
transfer time and WAN usage compared to the state-of-the-art graph partitioning methods.