Hadoop_g5k is a tool that makes it easier to manage Hadoop and Spark clusters and prepare reproducible experiments in the Grid 5000 platform. Hadoop_g5k offers a set of scripts to be used in command-line interfaces and a Python API to interact with the clusters. It is currently active within the G5k community, facilitating the preparation and execution of experiments in the platform.
FP-Hadoop makes the reduce side of Hadoop MapReduce more parallel and efficiently deals with the problem of data skew in the reduce side. In FP-Hadoop, there is a new phase, called intermediate reduce (IR), in which blocks of intermediate values, constructed dynamically, are processed by intermediate reduce workers in parallel. Our experiments using FP-Hadoop using synthetic and real benchmarks have shown excellent performance gains compared to native Hadoop, e.g. more than 10 times in reduce time and 5 times in total execution time.
SON – Shared-data Overlay Network (2012-2014)
SON is a development tool for P2P networks using web services, JXTA and OSGi. The development of a SON application is done through the design and implementation of a set of components. Each component includes a technical code that provides the component services and a code component that provides the component logic (in Java). The complex aspects of asynchronous distributed programming are separated from code components and automatically generated from an abstract description of services for each component by the component generator.
museval is a python package to evaluate source separation results using the MUSDB18 dataset also released by Zenith. This package has been first proposed as part of the MUS task of the Signal Separation Evaluation Campaign (SISEC 2018). It includes the official reference implementation of the new BSSEval version 4 objective metrics, that are widely used in the community to assess performance.
VersionClimber is an automated system to help update the package and data infrastructure of a software application based on priorities that the user has indicated (e.g. the user cares more about having a recent version of this package). The system does a systematic and heuristically efficient exploration (using bounded upward compatibility) of a version search space in a sandbox environment (Virtual Env or conda env), finally delivering a lexicographically maximum configuration based on the user-specified priority order. It works for Linux and Mac OS on the cloud.