Le Lirmm et Inria tiendront un stand au village des sciences de Genopolys pour la fête de la science. Rendez-vous jeudi 10 et vendredi 11 octobre pour les publics scolaires, ainsi que samedi 12 octobre pour un accueil tout public. Au programme : films, ateliers (bouteilles et océans, mallette d’activités déconnectées,…) et le jeu Datagramme !
Permanent link to this article: https://team.inria.fr/zenith/fete-de-la-science-participation-de-zenith-au-village-des-sciences-de-genopolys-3-jours/
Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-alexis-jolyplntnet-interactive-plant-identification-and-collaborative-information-system-sept-20-2pm/
Sep 13
Zenith seminar: Irina Alles,”Time Series Clustering in the Field of Agronomy”, Sept 13, 2pm.
Irina Alles will present her work on phenotypic data clustering on september 13, at 2pm (Galera 127).
Title: Time Series Clustering in the Field of Agronomy
Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-irina-allestime-series-clustering-in-the-field-of-agronomy-sept-13-2pm/
Permanent link to this article: https://team.inria.fr/zenith/post-doc-offer-optimizing-the-cloud-for-data-mining/
Permanent link to this article: https://team.inria.fr/zenith/seminaire-du-pole-donnees-connaissances-mohamed-reda-bouadjenek-approaches-and-algorithms-for-information-retrieval-based-on-social-network-analysismining-5-juillet-11h00/
Permanent link to this article: https://team.inria.fr/zenith/seminaire-du-pole-donnees-connaissances-manuel-serrano-des-ordinateurs-aux-tablettes-la-programmation-du-web-diffus-1er-juillet-14h30/
Permanent link to this article: https://team.inria.fr/zenith/seminaire-du-pole-donnees-connaissances-eliya-buyukkaya-a-peer-to-peer-based-virtual-environment-system-1er-juillet-11h00/
Permanent link to this article: https://team.inria.fr/zenith/numev-reunion-de-laxe-donnees-mardi-21-mai-10h30-12h/
May 17
PhD offer: Multisite Management of Data-intensive Scientific Workflows in the Cloud
Directors: Esther Pacitti (University Montpellier 2), Marta Mattoso (UFRJ) and Patrick Valduriez (Inria)
Contact: Patrick.Valduriez@inria.fr
Funding: The joint Microsoft-Inria Research Center
Gross salary : 1957 euros/month (36 months)
This work is part of a new project on advanced data storage and processing for cloud workflows (2013-2017) funded by Microsoft Research, in collaboration with the Kerdata INRIA team. It will be conducted within the Institut de Biologie Computationelle in Montpellier.
Scientific workflows allow scientists to easily express multi-step computational tasks, for instance, load input data files, preprocess the data, run various analyses, and aggregate the results. A scientific workflow describes the dependencies between tasks, typically as a Directed Acyclic Graph (DAG) where the nodes are tasks (that can call programs) and the edges express the task dependencies. As scientific workflows need to deal more and more with big data, it becomes critical to process them in high-performance computing environments such as clusters or clouds. Some scientific workflow systems such as Pegasus and Swift provide parallel support but with an imperative language, which forces optimization and parallelization to be hardcoded.
To be amenable to automatic optimization and parallel processing, the specification of a workflow should be high-level. Recently [1], we have proposed an algebraic approach for the optimization and parallelization of data-intensive scientific workflows. This approach is based on a workflow algebra with powerful operators such as Filter, Map and Reduce, a set of algebraic transformation rules as a basis for optimization and a parallel execution model. It has been implemented in Chiron [2] in a cluster environment.
In this thesis, we consider the problem of managing algebraic workflows to run efficiently in a multisite cloud environment, where each site has its own cluster, data and programs. Such environment is well suited for scientific communities, with groups and labs located at geographically dispersed sites. The problem resembles multisite query processing in distributed and parallel database systems [3,4] and we plan to develop similar techniques for workflow decomposition, optimization and parallelization, dynamic task allocation and efficient management of intermediate data to be exchanged between sites. These techniques will be validated by a prototype implemented using the BlobSteer distributed storage system [5] on Microsoft Azure.
Note: a second Ph.D. position related to the joint project is available in the Kerdata team.
References:
[1] E. Ogasawara, J. F. Dias, D. de Oliveira, F. Porto, P. Valduriez, M. Mattoso. An Algebraic Approach for Data-centric Scientific Workflows. In Proceedings of the VLDB Endowment (PVLDB), 4(12): 1328-1339, 2011.
[2] E. Ogasawara, D. Jonas, V. Silva, C. Fernando, D. De Oliveira, F. Porto, P. Valduriez, M. Mattoso. Chiron: A Parallel Engine for Algebraic Scientific Workflows. Journal of Concurrency and Computation: Practice and Experience, 2013.
[3] M. T. Özsu, P. Valduriez. Principles of Distributed Database Systems”. Third Edition, Springer ISBN 978-1-4419-8833-1, 2011.
[4] E. Pacitti, R. Akbarinia, M. El Dick. P2P Techniques for Decentralized Applications. Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2012.
[5] B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. BlobSeer: Next Generation Data Management for Large Scale Infrastructures. Journal of Parallel and Distributed Computing, 71 (2):168-184, 2011.
Requirements
- Distributed programming, distributed and parallel data management, programming languages like C++, Java.
- Fluent English (internship stays at MSR Redmond, USA, are planned).
Permanent link to this article: https://team.inria.fr/zenith/phd-2013/
May 07
Zenith seminar: Maximilien Servajean,”Profile Diversity in Search and Recommendation”, May 7, 3pm.
Maximilien will present a recent work, accepted in a workshop held with WWW 2013. Galéra, room 127.
Title: Profile Diversity in Search and Recommendation
Abstract: We investigate profile diversity, a novel idea in searching scientic documents. Combining keyword relevance with popularity in a scoring function has been the subject of dierent forms of social relevance. Content diversity has been thoroughly studied in search and advertising, database queries, and recommendations. We believe our work is the first to investigate profile diversity to address the problem of returning highly popular but too-focused documents. We show how to adapt Fagin’s threshold-based algorithms to return the most relevant and most popular documents that satisfy content and profile diversities and run preliminary experiments on two benchmarks to validate our scoring function.
Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-maximilien-servajeanprofile-diversity-in-search-and-recommendation-may-7-3pm/