Séminaire du pôle “Données Connaissances”: Manuel Serrano, “Des ordinateurs aux tablettes, la programmation du Web diffus”, 1er juillet, 14h30.

diffusSéminaire du pôle Données Connaissances

Organisé par l’équipe Zenith
Lundi 1er juillet, 14h30

Salle Galera 127

Des ordinateurs aux tablettes, la programmation du Web diffus.

Manuel Serrano,

INRIA Sophia Antipolis

L’informatique individuelle a été profondément bouleversée par les smartphones et les tablettes. En l’espace de quelques années, ces périphériques ont rattrapé en nombre, mais aussi presque en capacité, les ordinateurs individuels que nous utilisons depuis les années 1980. Comme les téléphones sont très peu encombrants nous les portons (presque) toujours avec nous. Comme de plus ils sont très connectés au monde réel par une multitude de capteurs et au monde électronique par une large couverture réseau, ils permettent la réalisation denouvelles applications qui étaient inimaginables il y a tout juste quelques années : les applications diffuses.

Toutefois, la programmation diffuse est complexe car elle cumule une grande partie des difficultés de la programmation classique auquel elle ajoute un lot de problèmes inédits. Lors de ce séminaire nous présenterons Hop, un langage de programmation conçu pour traiter ces problèmes. Il s’appuie très fortement sur l’architecture du Web qu’il considère comme vaste une plateforme d’exécution. Le séminaire commencera par une brève mise en perspective historique des techniques de programmation du Web. Suivra un exposé des principales caractéristiques du langage. Une application réaliste sera ensuite présentée et quelques points de son implantation détaillés.

Permanent link to this article: https://team.inria.fr/zenith/seminaire-du-pole-donnees-connaissances-manuel-serrano-des-ordinateurs-aux-tablettes-la-programmation-du-web-diffus-1er-juillet-14h30/

Séminaire du pôle “Données Connaissances”: Eliya Buyukkaya, “A peer-to-peer-based virtual environment system”, 1er juillet, 11h00.

virtual meetingSéminaire Pole Données et Connaissances
1/7/2013 à 11h, salle 127 Galera

A peer-to-peer-based virtual environment system
Eliya Buyukkaya – ENSSAT (École Nationale Supérieure des Sciences Appliquées et de Technologie)

Abstract: 
Virtual environments (VEs) are 3-D virtual worlds in which a huge number of participants play roles and interact with their surroundings through virtual representations called avatars. VEs are traditionally supported by a client/server architecture. However, centralized architectures can lead to bottleneck on the server due to high communication and computation overhead during peak loads. Thus, P2P overlay networks are emerging as a promising architecture for VEs. However, exploiting P2P schemes in VEs is not straightforward, and several challenging issues related to data distribution and state consistency should be considered.

One of the key aspects of P2P-based VEs is the logical platform consisting of connectivity, communication and data architectures, on which the VE is based. The connectivity architecture is the overlay topology structure, which defines how peers are connected to each other. The communication architecture is the routing protocol defining how peers can exchange messages, while the data architecture defines how data are distributed over the logical overlay. The design of these architectures has significant influence on the performance and scalability of VEs.

First, we propose a scalable connectivity architecture based on a new triangulation algorithm reducing maintenance cost of the system. Second, we construct a communication architecture built on top of the connectivity architecture ensuring that each message reaches its intended destination. Finally, we propose a data architecture ensuring the management of data with different characteristics in terms of mobility in the VE, while providing a fair data distribution and low data transfer between peers in the VE.

Permanent link to this article: https://team.inria.fr/zenith/seminaire-du-pole-donnees-connaissances-eliya-buyukkaya-a-peer-to-peer-based-virtual-environment-system-1er-juillet-11h00/

Numev : réunion de l’axe Données Mardi 21 mai, 10h30 – 12h.

numevLa prochaine réunion de l’axe Données NUMEV se tiendra Mardi 21 mai, 10h30 – 12h, La Galera, salle 127.

Programme:

  • Infos NUMEV: appels à projets, workshop
  • Nadine Hilgert (INRA)Quelques pistes de recherche en statistique pour données fonctionnelles autour des données de phénotypage haut-débit (projet Phenome). Résumé : Les plates-formes de phénotypage génèrent de grandes quantités de données issues de mesures de variables diverses au cours du temps, sur des centaines/milliers de plantes. Valoriser et exploiter ces masses de données est un défi pour produire de nouvelles connaissances en biologie et en génétique. Il s’agit de développer une méthodologie d’analyse et de modélisation des données du phénotypage, dans le même esprit que ce qui a été fait pour le génotypage avec l’émergence de la bioinformatique. Je montrerai quelques questions de recherche ouvertes et développerai les solutions possibles en statistique pour données fonctionnelles.
  • Maximilien Servajean (LIRMM), Esther Pacitti (LIRMM), Sihem Amr Yahia (LIG), Pascal Neveu (INRA). Profile diversity in search and recommendation. Résumé : We investigate profile diversity, a novel idea in searching scientic documents. Combining keyword relevance with popularity in a scoring function has been the subject of dierent forms of social relevance. Content diversity has been thoroughly studied in search and advertising, database queries, and recommendations. We believe our work is the first to investigate profile diversity to address the problem of returning highly popular but too-focused documents. We show how to adapt Fagin’s threshold-based algorithms to return the most relevant and most popular documents that satisfy content and profile diversities and run preliminary experiments on two benchmarks to validate our scoring function.
  • Andre Mas (I3M), Pascal Poncelet (LIRMM). Point sur le projet VIPP.
  • Discussions.

Permanent link to this article: https://team.inria.fr/zenith/numev-reunion-de-laxe-donnees-mardi-21-mai-10h30-12h/

PhD offer: Multisite Management of Data-intensive Scientific Workflows in the Cloud

Directors: Esther Pacitti (University Montpellier 2), Marta Mattoso (UFRJ) and Patrick Valduriez (Inria)
Contact: Patrick.Valduriez@inria.fr
Funding: The joint Microsoft-Inria Research Center
Gross salary : 1957 euros/month (36 months)

This work is part of a new project on advanced data storage and processing for cloud workflows (2013-2017) funded by Microsoft Research, in collaboration with the Kerdata INRIA team. It will be conducted within the Institut de Biologie Computationelle in Montpellier.

Scientific workflows allow scientists to easily express multi-step computational tasks, for instance, load input data files, preprocess the data, run various analyses, and aggregate the results. A scientific workflow describes the dependencies between tasks, typically as a Directed Acyclic Graph (DAG) where the nodes are tasks (that can call programs) and the edges express the task dependencies. As scientific workflows need to deal more and more with big data, it becomes critical to process them in high-performance computing environments such as clusters or clouds. Some scientific workflow systems such as Pegasus and Swift provide parallel support but with an imperative language, which forces optimization and parallelization to be hardcoded.

To be amenable to automatic optimization and parallel processing, the specification of a workflow should be high-level. Recently [1], we have proposed an algebraic approach for the optimization and parallelization of data-intensive scientific workflows. This approach is based on a workflow algebra with powerful operators such as Filter, Map and Reduce, a set of algebraic transformation rules as a basis for optimization and a parallel execution model. It has been implemented in Chiron [2] in a cluster environment.

In this thesis, we consider the problem of managing algebraic workflows to run efficiently in a multisite cloud environment, where each site has its own cluster, data and programs. Such environment is well suited for scientific communities, with groups and labs located at geographically dispersed sites. The problem resembles multisite query processing in distributed and parallel database systems [3,4] and we plan to develop similar techniques for workflow decomposition, optimization and parallelization, dynamic task allocation and efficient management of intermediate data to be exchanged between sites. These techniques will be validated by a prototype implemented using the BlobSteer distributed storage system [5] on Microsoft Azure.

Note: a second Ph.D. position related to the joint project is available in the Kerdata team.

References:

[1] E. Ogasawara, J. F. Dias, D. de Oliveira, F. Porto, P. Valduriez, M. Mattoso. An Algebraic Approach for Data-centric Scientific Workflows. In Proceedings of the VLDB Endowment (PVLDB), 4(12): 1328-1339, 2011.

[2] E. Ogasawara, D. Jonas, V. Silva, C. Fernando, D. De Oliveira, F. Porto, P. Valduriez, M. Mattoso. Chiron: A Parallel Engine for Algebraic Scientific Workflows. Journal of Concurrency and Computation: Practice and Experience, 2013.

[3] M. T. Özsu, P. Valduriez. Principles of Distributed Database Systems”. Third Edition, Springer ISBN 978-1-4419-8833-1, 2011.

[4] E. Pacitti, R. Akbarinia, M. El Dick. P2P Techniques for Decentralized Applications. Synthesis Lectures on Data Management, Morgan & Claypool Publishers, 2012.

[5] B. Nicolae, G. Antoniu, L. Bougé, D. Moise, A. Carpen-Amarie. BlobSeer: Next Generation Data Management for Large Scale Infrastructures. Journal of Parallel and Distributed Computing, 71 (2):168-184, 2011.

Requirements

  • Distributed programming, distributed and parallel data management, programming languages like C++, Java.
  • Fluent English (internship stays at MSR Redmond, USA, are planned).

Permanent link to this article: https://team.inria.fr/zenith/phd-2013/

Zenith seminar: Maximilien Servajean,”Profile Diversity in Search and Recommendation”, May 7, 3pm.

smiley-face-ratingMaximilien will present a recent work, accepted in a workshop held with WWW 2013. Galéra, room 127.

Title: Profile Diversity in Search and Recommendation

Abstract: We investigate profile diversity, a novel idea in searching scientic documents. Combining keyword relevance with popularity in a scoring function has been the subject of dierent forms of social relevance. Content diversity has been thoroughly studied in search and advertising, database queries, and recommendations. We believe our work is the first to investigate profile diversity to address the problem of returning highly popular but too-focused documents. We show how to adapt Fagin’s threshold-based algorithms to return the most relevant and most popular documents that satisfy content and profile diversities and run preliminary experiments on two benchmarks to validate our scoring function.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-maximilien-servajeanprofile-diversity-in-search-and-recommendation-may-7-3pm/

Engineer R&D offer: A Middleware for Scientific Workflows

The objective of the position is to participate in the development of SciFloware, a middleware for the execution of scientific workflows in a distributed / parallel way. We will build on our experience with the Shared-Data Overlay middleware Network (http://www-sop.inria.fr/teams/zenith/SON/) and an innovative algebraic approach to the management of scientific workflows.

SciFloware provides a development environment and a runtime environment for scientific workflows, interoperable with existing systems. We validate SciFloware with workflows for analyzing biological data provided by our partners CIRAD, INRA and IRD.

The engineer will participate in the development of specific middleware with its coordinate language of existing scientific workflows that run in the cloud.
Keywords: Scientific Workflow, Cloud computing, Big Data

Apply online Continue reading

Permanent link to this article: https://team.inria.fr/zenith/engineer-rd-offer-a-middleware-for-scientific-workflows/

X-data: new national project on Big Data with Data Publica et al.

The X-Data project is a “projet investissements d’avenir” on big data with Data Publica (leader), Orange, La Poste, EDF, Cinequant, Hurence and INRIA (Indes, Planete and Zenith) .

Permanent link to this article: https://team.inria.fr/zenith/x-data-new-national-project-on-big-data-wtih-data-publica-et-al/

New project on advanced data storage and processing for cloud workflows with Microsoft

Zenith engages in a new project funded by Microsoft to work on the problem of advanced data storage and processing for supporting scientific workflows in the cloud. More here.

Permanent link to this article: https://team.inria.fr/zenith/new-project-on-advanced-data-storage-and-processing-for-cloud-workflows-2013-2017-with-microsoft/

IBC seminar: Marta Mattoso, “Big Data Workflows – how provenance can help”, March 25, 2pm.

Séminaire IBC

Lundi 25 mars, 14h

Salle 127, Batiment la Galera

Organisé par l’équipe Zenith

Big Data Workflows – how provenance can help
Marta Mattoso
UFRJ, Rio de Janeiro
Brazil

Big data analyses are critical for decision support in business data processing. These analyses involve the execution of many activities such as: programs to explore data from the web, databases, data warehouses and files; data cleaning procedures; programs to aggregate data; core programs that perform analyses; and tools to visualize and interpret the results. Each step (activity) of the analysis is performed isolated from the other and the analysts need to manually manage the larger life cycle of big data analysis. Big data analysis started to be represented as pipelines or dataflows. However, current approaches lack features to provide a consistent view of many different explorations and activities as part of a broader analysis, like a computational experiment. Scientific workflows have long provided such features for scientific experiments, and although originally designed for science, they may be useful to support the life cycle of big data analysis. Scientific analyses typically involve
experimenting with several steps using different datasets and computer programs. Scientists need to manage the composition, execution and analysis of their experiments carefully, so the results can be trusted and the experiments reproducible. To help managing experiments, scientific workflow management systems (SWfMS) have been proposed to let scientists design workflows of different complexities and manage their execution, including high performance computing (HPC) in cloud environments. Most SWfMS also have provenance data support. Provenance tracks how the results of the experiments were produced, which is essential to make an experiment (big data analysis) reproducible and trustworthy. Business Process Workflows are focused on modeling the process rather than managing big data flows with provenance and HPC. In this talk we discuss on provenance support along the big data analysis workflow as an alternative to improve results of big data
analysis, especially in a long-term view

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-marta-mattoso-big-data-workflows-how-provenance-can-help-march-25-2pm/

IBC seminar: Patrick Valduriez, “Parallel Techniques for Big Data”, March 22, 2pm.

big-dataCe 6ième séminaire, dans le cadre de l’axe 5 “Données” de l’institut, aura lieu le Vendredi 22 Mars, de 14h à 15h30, à l’IBC, salle 127 (pour venir: http://g.co/maps/ygsrk):

Patrick Valduriez,
Zenith team, INRIA and LIRMM
http://www-sop.inria.fr/members/Patrick.Valduriez/

Parallel Techniques for Big Data

Big data has become a buzzword, referring to massive amounts of data that are very hard to deal with traditional data management tools. In particular, the ability to produce high-value information and knowledge from big data makes it critical for many applications such as decision support, forecasting, business intelligence, research, and (data-intensive) science. Processing and analyzing massive, possibly complex data is a major challenge since solutions must combine new data management techniques (to deal with new kinds of data) with large-scale parallelism in cluster, grid or cloud environments. Parallel data processing has long been exploited in the context of distributed and parallel database systems for highly structured data. But big data encompasses different data formats (documents, sequences, graphs, arrays, …) that require significant extensions to traditional parallel techniques. In this talk, I will discuss such extensions, from the basic techniques and architectures to NoSQL systems and MapReduce.

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-patrick-valduriez-parallel-techniques-for-big-data-march-22-2pm/