Performance benchmarking for geo-distributed databases
Large-scale web applications today are built on top of high-performance Cloud servers hosting a distributed database. Geo-replication in Cloud datacenters is used to avoid network latency and provide fast response time.
A geo-distributed database distributes the shared data across different data centres (DCs) for better availability and performance. As the number of clients and geographical reach grows larger, more DCs can be added to accommodate them. Giant internet-based companies such as Google, Facebook, and Amazon rely on geo-replicated databases to offer their services with high availability and low latency. Unfortunately, writing correct programs for geo-replicated databases is more challenging than writing programs for a centralized database.
AntidoteDB  is a geo-replicated database developed within a European collaboration that includes the DELYS group at LIP6. AntidoteDB offers a set of innovative features designed to make geo-distributed programming easier, such as CRDTs (Conflict-free Replicated Data Types), data structures that encapsulate the complexity of replication, transactional APIs to execute multiple operations atomically, and causal consistency to ensure the correct ordering of updates.
Student Project Objectives
The objective of this student project is to reach a better understanding of AntidoteDB performance in realistic situations. This can be achieved by implementing standard benchmark programs, and using them to measure metrics such as latency and throughput. Two widely-used benchmarks are TPC-C and TPC-E, designed by the Transactional Processing Performance Council .
- Learn AntidoteDB APIs (how to use CRDTs and Transactional Causal Consistency).
- Learn how to install, deploy and monitor AntidoteDB on a cluster of machines.
- Learn the specifications of the TPC-C and TPC-E benchmarks.
- Implement TPC-C and TPC-E against the AntidoteDB API.
- Write benchmarking scripts to measure the performance of the benchmarks.
- Measure the performance of the benchmarks and investigate any bottlenecks.Write a report detailing the experimental setup, the achieved performance results, and outcome of the investigation.
This internship is for students strongly motivated by advanced technologies in geo-scale distributed databases, distributed systems, distributed algorithms, and consistency. To apply, send your CV and two references to firstname.lastname@example.org