Topic: Cloud platforms rely on technologies and architectures that handle massive distribution of data and computation. They are usually provided and maintained by major companies (Amazon, Google, Yahoo, Microsoft). Hadoop is an open source platform written in Java that allows data management and processing in a cloud environment. It is maintained by the Apache Foundation and implements the Google MapReduce technology. Today, most solutions for data mining in the cloud are straightforward implementations of existing algorithms in the selected cloud programming language. A basic illustration is the implementation for MapReduce of the aPriori algorithm which performs successive counting steps that rely on the native cloud primitives.
However, not all algorithms can have such straightforward implementations.This work aims at focusing on a set of major data mining algorithms and optimizing Hadoop for them. Such algorithms have to be useful for different applications (e.g., finding frequent itemsets and sequential patterns, clustering, etc.).
Missions and activities:
Your mission will consist in:
- Proposing efficient algorithms for a set of well known data mining problems (frequent itemsets, clustering) that require specific adaptation to the cloud.
- Implementing the proposed algorithms on top of Hadoop.
- Performing experiments over real scientific data in an experimental platform for large scale parallel and distributed systems, to evaluate the performance of the proposed algorithms for the tackled data mining problems.
Skills and profiles:
– Strong knowledge of statistics.
– Good proficiency in English.
– Good programming skills in Java.
– A Ph.D. in computer science or mathematics.
Duration, Location and Salary:
Duration is 18 months and the location is Montpellier.
The position should be fulfilled by September 2013 (however, a starting date by December 2013 may be negotiated). The position might be extended to 24 months in total (depending on the evolution of the fundings).
The net salary is 2138 Euros and includes social security (gross salary is € 2620.84)
This post-doc will take place in the Zenith team of INRIA. It is funded by the Datascale project that is a project funded by the French Government, and involves industrial and academic partners (Bull, Armadillo, ActiveEon, Twenga, XediX, CEA, INRIA, IPGP). The project aims at developing technologies for Big Data.
The Zenith project-team of INRIA, headed by Patrick Valduriez, aims to propose new solutions related to scientific data and activities. Our research topics incorporate the management and analysis of massive and complex data, such as uncertain data, in highly distributed environments.
Our team is located in Montpellier that is a very active town located in south of France. It gathers together major research Labs, that work on environment and health, such as INRA, CIRAD or IRD. Generally speaking, these scientific activities generate extremely large amounts of complex data that need to be managed and analyzed.
- Patrick Valduriez
- Florent Masseglia
- Reza Akbarina