A Data-Centric Language and Execution Model for Scientific Workflows

PhD position

Advisors: Didier Parigot and Patrick Valduriez, Inria

The Zenith team deals with the management of scientific applications that are computation-intensive and manipulate large amounts of data. These applications are often represented by workflows, which describe sequences of tasks (computations) and data dependencies between these tasks. Several scientific workflow environments have been already proposed [3]. However, they have little support for efficiently managing large data sets. The Zenith team develops an original approach that deals with such large data sets in a way that allows efficient placement of both tasks and data on large-scale (distributed and parallel) infrastructures for more efficient execution. To this end, we propose an original solution that combines the advantages of cloud computing and P2P technologies. This work is part of the IBC project (Institut de Biologie Computationelle – http://www.ibc-montpellier.fr), in collaboration with biologists, in particular from CIRAD and IRD, and cloud providers, in particular Microsoft.

The concept of cloud computing combines several technology advances such as Service-Oriented Architectures, resource virtualization, and novel data management systems referred to as NoSQL. These technologies enable flexible and extensible usage of resources, which is referred to as elasticity. In addition, the Cloud allows users to simply outsource data storage and application executions. For the manipulation of big data, NoSQL database systems, such as Google Bigtable, Hadoop Hbase, Amazon Dynamo, Apache Cassandra, 10gen MongoDB, have been recently proposed.

Existing scientific workflow environments [3] have been developed primarily to simplify the design and execution of a set of tasks in a particular infrastructure. For example, in the field of biology, the Galaxy environment allows users to introduce catalogs of functions/tasks and compose these functions with existing functions in order to build a workflow. These environments propose a design approach that we can classify as “process-oriented”, where information about data dependencies (data flow) is purely syntactic. In addition, the targeted execution infrastructures are mostly computation-oriented, like clusters and grids. Finally, the data produced by scientific workflows are often stored in loosely structured files, for further analysis. Thus, data management is fairly basic, with data are either stored on a centralized disk or directly transferred between tasks. This approach is not suitable for data-intensive applications because data management becomes the major bottleneck in terms of data transfers.

As part of a new project that develops a middleware for scientific workflows (SciFloware), the objective of this thesis is to design a declarative data-centric language for expressing scientific workflows and its associated execution model. A declarative language is important to provide for automatic optimization and parallelization [1].  The execution model for this language will be decentralized, in order to yield flexible execution in distributed and parallel environments. This execution model will capitalize on execution models developed in the context of distributed and parallel database systems [2]. To validate this work, a prototype will be implemented using the SON middleware [4] and a distributed file system like HDFS.

References

[1] E. Ogasawara, J. F. Dias, D. de Oliveira, F. Porto, P. Valduriez, M. Mattoso. An Algebraic Approach for Data-centric Scientific Workflows. Proceedings of the VLDB Endowment (PVLDB), 4(12) : 1328-1339, 2011.

[2] M. T. Özsu, P. Valduriez. Principles of Distributed Database Systems. Third Edition, Springer, 2011.

[3] I. J. Taylor, E. Deelman, D. B. Gannon, M. Shields. Workflows for e-Science : Scientific Workflows for Grids. First Edition, Springer, 2007.

[4] A. Ait-Lahcen, D. Parigot. A Lightweight Middleware for developing P2P Applications with Component and Service-Based Principles. 15th IEEE International Conference on Computational Science and Engineering, 2012.

Contact: Didier Parigot (Firstname.Lastname@inria.fr)

Apply online

Permanent link to this article: https://team.inria.fr/zenith/a-data-centric-language-and-execution-model-for-scientific-workflows/

Zenith seminar: Dennis Shasha,”Upstart Puzzles”, January 30, 2013.

Galéra, room 127 at 10:30.

Dr. Dennis Shasha is a professor of Mathematical Sciences in the Department of Computer Science at NYU. Along with research and teaching in biological computing, pattern recognition, database tuning , cryptographic file systems, and the like, Dennis is well-known for his mathematical puzzle column for Dr. Dobbs whose readers are very sharp and his Puzzling Adventures Column for the Scientific American. His puzzle writing has given birth to fictional books about a mathematical detective named Dr. Ecco. Dr. Shasha has also co-authored numerous highly technical books. Dennis speaks often at conferences and is a tireless self-promoter in the world of “mensa-like” puzzles.

more details at www.cs.nyu.edu/shasha

Title: Upstart Puzzles

Abstract: The writer of puzzles often invents puzzles to illustrate a principle. The puzzles, however, sometimes have other ideas. They speak up and say that they would be so much prettier as slight variants of their original selves.

The dilemma is that the puzzle inventor sometimes can’t solve those variants. Sometimes he finds out that his colleagues can’t solve them either, because there is no existing theory for solving them. At that point, these sassy variants deserve to be called upstarts.

We discuss a few upstarts inspired originally from the Falklands/Malvinas Wars, zero-knowledge proofs, and hikers in Colorado, and city planning. They have given a good deal of trouble to a certain mathematical detective whom I know well.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-dennis-shashaupstart-puzzles-january-30-2013/

Zenith scientific seminar: Dennis Shasha,”Storing Clocked Programs Inside DNA: A Simplifying Framework for Nanocomputing”, January 28, 2013.

Galéra, room 127 at 2:30pm.

Dr. Dennis Shasha is a professor of Mathematical Sciences in the Department of Computer Science at NYU. Along with research and teaching in biological computing, pattern recognition, database tuning , cryptographic file systems, and the like, Dennis is well-known for his mathematical puzzle column for Dr. Dobbs whose readers are very sharp and his Puzzling Adventures Column for the Scientific American. His puzzle writing has given birth to fictional books about a mathematical detective named Dr. Ecco. Dr. Shasha has also co-authored numerous highly technical books. Dennis speaks often at conferences and is a tireless self-promoter in the world of “mensa-like” puzzles.

more details at www.cs.nyu.edu/shasha

Title: Storing Clocked Programs Inside DNA: A Simplifying Framework for Nanocomputing

Abstract: In the history of modern computation, large mechanical calculators preceded computers. A person would sit there punching keys according to a procedure and a number would eventually appear. Once calculators became fast enough, it became obvious that the critical path was the punching rather than the calculation itself. That is what made the stored program concept vital to further progress. Once the instructions were stored in the machine, the entire computation could run at the speed of the machine.

This work shows how to do the same thing for DNA computing. Rather than asking a robot or a person to pour in specific strands at different times in order to cause a DNA computation to occur (by analogy to a person punching numbers and operations into a mechanical calculator), the DNA instructions are stored within the solution and guide the entire computation. We show how to store straight line programs, conditionals, loops, and a rudimentary form of subroutines. We propose a novel machine motif which constitutes an instruction stack, allowing for the clocked release of an arbitrary sequence of DNA instruction or data strands. The clock mechanism is built of special strands of DNA called “tick” and “tock.” Each time a “tick” and “tock” enter a DNA solution, a strand is released from an instruction stack (by analogy to the way in which as a clock cycle in an electronic computer causes a new instruction to enter a processing unit). As long as there remain strands on the stack, the next cycle will release a new instruction strand. Regardless of the actual strand or component to be released at any particular clock step, the “tick” and “tock” fuel strands remain the same, thus shifting the burden of work away from the end user of a machine and easing operation. Pre-loaded stacks enable the concept of a stored program to be realized as a physical DNA mechanism.

We demonstrate by a series of experiments conducted in Ned Seeman’s lab that it is possible to “initialize” a clocked stored program DNA machine. We end with a discussion of the design features of a programming language for clocked DNA programming. There is a lot left to do.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-dennis-shashastoring-clocked-programs-inside-dna-a-simplifying-framework-for-nanocomputing-january-28-2013/

A Decentralized Management Approach for Data-intensive Scientific Workflows

PhD. thesis

Director: Patrick Valduriez, senior researcher at INRIA

Supervisor: Hinde Bouziane, associate professor at University Montpellier 2

The ZENITH team is interested in the management of scientific applications that are computation intensive and that manipulate large amounts of data. These applications are often represented by workflows, which describe sequences of tasks (computations) and data dependencies between these tasks. Several scientific workflow
environments have been already proposed [3]. However, they have no support for efficiently managing large data sets. Our team aims to develop an original approach that deals with such large data sets and allows efficient placement of both tasks and data on large scale infrastructures for more efficient execution. To this end, we propose an original solution that combines the advantages of cloud computing and P2P technologies. This work will be part of the IBC project (Institut de Biologie Computationelle – http://www.ibc-montpellier.fr) and will be done in collaboration with biologists, especially from CIRAD, and cloud providers, in particular Microsoft.

The recent concept of cloud Computing mainly relies on the following technology advances: Service-Oriented Architectures, resource virtualization (computational and storage), and modern data management systems, referred to as NoSQL. These technologies enable a flexible and extensible (elastic) usage of resources that is inherent to the Cloud. In addition, the Cloud allows users to simply outsource data storage and application executions. For the manipulation of big data, NoSQL database systems, such as Google Bigtable, Hadoop Hbase, Amazon Dynamo, Apache Cassandra, 10gen MongoDB, have been recently proposed.

Existing scientific workflow environments [3] have been developed primarily to simplify the design and the execution of a set of tasks on parallel-distributed infrastructures. For example, in the field of biology, the Galaxy environment allows users to introduce catalogs of functions/tasks and to compose these functions with other existing ones in order to build a workflow. Therefore, these environments propose a design approach that we can classify as “process-oriented”, where the information about data dependencies (data flow) are purely syntactic. In addition, the targeted execution infrastructures are mostly computation-oriented, like clusters and grid. Finally, data analyzed and produced by a scientific workflow are often stored in loosely structured files. Simple and classical mechanisms for their management are used: they are either stored on a centralized disk or directly transferred between tasks. This approach is not suitable for data-centric applications ( because of a bottleneck, costly data transfer, etc.).

The main goal of this thesis is to propose an original approach for partitioning and mapping data and tasks to distributed resources. This approach will be based on a declarative specification (data flow oriented) of a workflow, on recent NoSQL approaches for data distribution and management and finally, on a Service Oriented Architecture for flexibility. Moreover, a dynamic approach is necessary to take into account the elasticity of the cloud. It will rely on a P2P architecture, in particular SON [4], currently under development in ZENITH.

A declarative specification has the advantage to be able to express a composition of algebraic operators(map, reduce, split, filter, etc) on manipulated data, from which it is possible to automatically parallelize tasks and to optimize the placement of data and tasks. A first algebraic specification for distributed scientific workflows has
been proposed by ZENITH [1]. This specification is close to the one used in Distributed and Parallel Database Management Systems [2], which have been widely studied. We aim to rely on such systems to influence
tasks placement (and scheduling) depending on the data placement managed by NoSQL systems.

With regard to P2P architectures, they have the benefit to be autonomous, dynamic and scalable. Thus, the interest of this thesis is to propose placement algorithms that are decentralized and able to take decisions guided by observed changes withing the execution environment, data placement and performance requirements (e.g. a
maximum execution time constraint). In this context, we can also rely on existing works around adaptation, e.g. [7].

To validate this work, a prototype will be implemented using the SON middleware [4] and a distributed file system like HDFS (Hadoop Distributed File System). It will be then introduced in a workflow environment like Galaxy (used by CIRAD scientists and researchers, with whom we have collaborations). Experimentations will be performed on the experimental Grid5000 platform and on a cloud environment, in particular Microsoft Azure.

Permanent link to this article: https://team.inria.fr/zenith/une-approche-decentralisee-elastique-pour-des-workflows-scientifiques-orientes-donnees/

Zenith seminar: Florent Masseglia,”Big Data Mining”, December 13, 2012.

Florent Masseglia will present a survey on Big Data Mining December 13 at 10:30am, room G.127.

Title: Big Data Mining

Abstract: In this talk I will adopt a “binary” point of view on big data (i.e. data streams, and data that is so large and complex that it has to be managed in a cloud). Then I will focus on three problems of data mining: clustering, frequent itemsets, and frequent sequential patterns. I will give an overview of well known techniques for their discovery in large amounts of data and I will present a few clues on how to solve these problems in data streams and with cloud computing.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-florent-massegliabig-data-mining-december-13-2012/

Patrick Valduriez named ACM Fellow

The prestigious distinction from the Association for Computing Machinery (ACM) was recently awarded to a French national for the third time. This is a major honour for Patrick Valduriez, Senior Researcher at Inria and leader of the Zenith joint project-team with LIRMM* in Montpellier.

As one of the most influential computing societies in the scientific and educational world, the ACM awards every year the title of ACM Fellow to a few of its members for her outstanding contributions to computer science, the origin of fundamental knowledge and technological progress. It signifies international recognition at the highest level by one’s peers.

Fore more information:

http://www.inria.fr/en/centre/sophia/news/patrick-valduriez-named-acm-fellow

(*) The Montpellier Laboratory of Informatics, Robotics, and Microelectronics (cross-faculty research entity of the University of Montpellier 2 (UM2) and the National Center for Scientific Research (CNRS) – Institut des sciences informatiques et de leurs interactions (INS2I))

Permanent link to this article: https://team.inria.fr/zenith/patrick-valduriez-named-acm-fellow/

Workshop Mastodons@Montpellier: Gestion de Données à Grande Echelle en Science de la Vie

Vendredi 7 décembre 2012 de 9h à 17h

Lieu: Salle des séminaires, LIRMM, 161 rue Ada, 34392 Montpellier

Contacts: Esther.Pacitti@lirmm.fr et Eric.Rival@lirmm.fr

Site (pour s’inscrire): http://www.lirmm.fr/~pacitti/Mastodons.html

La biologie et ses applications, de la médecine à l’agronomie ou l’écologie, deviennent des sciences productrices des données massives et exigent des nouvelles approches computationnelles pour analyser et partager ces données.

Les nouvelles technologies de Séquençage à Haut Débit (SHD) apparues en 2005, et aussi appelées Séquençage de Nouvelle Génération (NGS), révolutionnent la manière dont sont posées et résolues les questions de recherches en science du vivant. Elles permettent aussi bien d’appréhender la diversité génomique au sein d’une espèce, que l’expression des gènes dans les cellules, ou les marques épigénétiques sur le génome. Les volumes de séquences amène ces sciences dans le domaine des « Big Data » et posent des challenges gigantesques pour l’exploitation de ces données.

Dans le domaine végétal, les méthodes de génétique quantitative permettent d’identifier les gènes impliqués dans des variations phénotypiques en réponse aux conditions environnementales. Elles produisent de grandes quantités de données (par ex. 105 données par jour) à différents intervalles de temps (de minutes à des jours), sur différents sites et à différentes échelles depuis des échantillons de petits tissus jusqu’à la plante entière.

Ce workshop interdisciplinaire rassemblera les chercheurs impliqués dans les axes de recherche traitement de données, bioinformatique, echophysiologie, biologie, et d’autres, pour permettre d’approfondir les discussions concernant le traitement de données à grande échelle et divers aspects spécifiques au traitement de données pour la séquencage à haut débit et phenotypage végétal, etc, pour pouvoir identifier les perspectives de recherche pour 2013.

Programme

9h Accueil

Session: Techniques de Gestion de Données à Grande Echelle
9h30 Conférence invitée : Jens Dittrich (Saarland University): Efficient Big Data Processing in Hadoop MapReduce
10h30 P. Valduriez, (INRIA & IBC, LIRMM): Parallel Techniques for Big Data Management

11h Pause

Session: Phenotypage à Grande Echelle
11h30 F. Tardieu (INRA, Montpellier) : Data Management in Plant Phenotyping: the roles of plants and crop models
12h Godin (INRIA): Toward high-throughput imaging for studying organismal development

12h30 Repas

Session: Traitement de Données de Phenotypage
14h E. Pacitti, M. Servajean (INRIA & LIRMM): Challenges on Phenotyping Data Sharing and a Case Study
14h30 F. Masseglia (INRIA & LIRMM), F. Tardieu (INRA): Data Mining: current approaches and questions in plant phenotyping

14h45 Pause courte

Session: Données et Séquencage à Grande Echelle
15h E. Rivals (IBC & LIRMM – CNRS & UM2) Challenges in the Analysis of High Throughput Sequencing Data
15h30 A. Chateau (IBC & LIRMM, UM2): Genome Assembly Verification

16h Pause courte

16h15 – 17h Discussion

Permanent link to this article: https://team.inria.fr/zenith/workshop-mastodonsmontpellier-gestion-de-donnees-a-grande-echelle-en-science-de-la-vie/

Zenith scientific seminar: Marta Mattoso,”Exploring Provenance in High Performance Scientific Computing”, December 6, 2012.

Marta Mattoso is Professor of the Department of Computer Science at the COPPE Institute from Federal University of Rio de Janeiro (UFRJ) since 1994, where she leads the Distributed Database Research Group. She has received the D.Sc degree from UFRJ. Dr. Mattoso has been active in the database research community for more than twenty years and her current research interests include distributed and parallel databases, data management aspects of scientific workflows.

Title: Exploring Provenance in High Performance Scientific Computing

Abstract: Large-scale scientific computations are often organized as a composition of many computational tasks linked through data flows. After the completion of a computational scientific experiment, a scientist has to analyze its outcome, for instance, by checking inputs and outputs of computational tasks that are part of the experiment. This analysis can be automated using provenance management systems that describe, for instance, the production and consumption relationships between data artifacts, such as files, and the computational tasks that compose the scientific application. Due to its exploratory nature, large-scale experiments often present iterations that evaluate a large space of parameter combinations. In this case, scientists need to analyze partial results during execution and dynamically interfere on the next steps of the simulation. Features, such as user steering on workflows to track, evaluate and adapt the execution need to be designed to support iterative methods. In this talk we show examples of iterative methods, such as, uncertainty quantification, reduced-order models, CFD simulations and bioinformatics. We discuss challenges in gathering, storing and querying provenance as structured data enriched with information about the runtime behavior of computational tasks in high performance computing environments. We also show how provenance can enable interesting and useful queries to correlate computational resource usage, scientific parameters, and data set derivation. We briefly describe how provenance of many-task scientific computations are specified and coordinated by current workflow systems on large clusters and clouds.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-marta-mattosoexploring-provenance-in-high-performance-scientific-computing-december-6-2012/

Zenith scientific seminar: Duy Hoa Ngo,”Enhancing Ontology Matching by Using Machine Learning, Graph Matching and Information Retrieval Techniques”, December 3, 2012.

Hoa will give a talk about his thesis work on Ontology Matching. He will defend his thesis a few days later (date to be announced).

Title: Enhancing Ontology Matching by Using Machine Learning, Graph Matching and Information Retrieval Techniques

Abstract: In recent years, ontologies have attracted a lot of attention in the Computer Science community, especially in the Semantic Web field. They serve as explicit conceptual knowledge models and provide the semantic vocabularies that make domain knowledge available for exchange and interpretation among information systems. However, due to the decentralized nature of the semantic web, ontologies are highly heterogeneous. This heterogeneity mainly causes the problem of variation in meaning or ambiguity in entity interpretation and, consequently, it prevents domain knowledge from sharing. Therefore, ontology matching, which discovers correspondences between semantically related entities of ontologies, becomes a crucial task in semantic web applications.

Several challenges to the field of ontology matching have been outlined in recent research. Among them, selection of the appropriate similarity measures as well as configuration tuning of their combination are known as fundamental issues that the community should deal with. In addition, verifying the semantic coherent of the discovered alignment is also known as a crucial task. Furthermore, the difficulty of the problem grows with the size of the ontologies.

To deal with these challenges, in this thesis, we propose a novel matching approach which combines different techniques coming from the fields of machine learning, graph matching and information retrieval in order to enhance the ontology matching quality. Indeed, we make use of information retrieval techniques to design new effective similarity measures for comparing labels and context profiles of entities at element level. We also apply a graph matching method named similarity propagation at structure level that effectively discovers mappings by exploring structural information of entities in the input ontologies. In terms of combination similarity measures at element level, we transform the ontology matching task into a classification task in machine learning. Besides, we propose a dynamic weighted sum method to automatically combine the matching results obtained from the element and structure level matchers. In order to remove inconsistent mappings, we design a new fast semantic filtering method. Finally, to deal with large scale ontology matching task, we propose two candidate selection methods to reduce computational space.

All these contributions have been implemented in a prototype named YAM++. To evaluate our approach, we adopt various tracks namely Benchmark, Conference, Multifarm, Anatomy, Library and Large Biomedical Ontologies from the OAEI campaign. The experimental results show that the proposed matching methods work effectively. Moreover, in comparison to other participants in OAEI campaigns, YAM++ showed to be highly competitive and gained a high ranking position.

Permanent link to this article: https://team.inria.fr/zenith/zenith-scientific-seminar-duy-hoa-ngoenhancing-ontology-matching-by-using-machine-learning-graph-matching-and-information-retrieval-techniques-december-3-2012/

Clustering de séries temporelles en agronomie : regrouper les plantes pour mieux les étudier.

Sujet de Stage M2R Info, 2012-13.

Florent Masseglia, Inria-Lirmm, florent.masseglia@inria.fr

François Tardieu, Inra, francois.tardieu@supagro.inra.fr

Patrick Valduriez, Inria-Lirmm, patrick.valduriez@inria.fr

Plus une plante est arrosée et éclairée, plus elle pousse… Cette « analyse » n’est pas très informative, surtout pour la recherche en agronomie qui demande des résultats plus fins sur les données qu’elle produit. Malheureusement, de telles évidences sont dominantes dans certaines études, parce qu’elles sont très caractéristiques de la réalité. Et cette domination est un obstacle pour la découverte de connaissances plus fines et plus instructives dans ces données, en particulier dans le domaine du phénotypage.

Le phénotypage étudie les relations entre le génotype (le patrimoine génétique) et le phénotype (le comportement) des plantes, dans plusieurs scénarios environnementaux. En d’autres termes, il s’agit de comparer l’évolution de plusieurs variétés génétiques d’une plante dans un même environnement. Cette comparaison permet de mieux comprendre certaines caractéristiques (capacité de production, résistance aux conditions climatiques, etc.) des plantes en fonction de leurs variétés.

Pour étudier ces réactions, chaque génotype est représenté plusieurs fois (e.g. de 3 à 10 plantes) afin de diminuer les risques d’exceptions statistiques. L’ensemble des plantes qui partagent le même génotype est appelé une « accession » ci après. La plateforme PhénoArch permet l’analyse de 1650 plantes, qui correspondent à un ensemble de 100 – 400 accessions suivant le nombre de traitements expérimentaux. La plateforme recueille des informations sur les plantes et sur leur environnement à intervalles régulier. Les données issues de la plateforme PhénoArch se présentent sous forme de séries temporelles (des mesures prises à intervalles réguliers) et peuvent concerner l’environnement (e.g. l’éclairement, la température de l’air, l’humidité) ou des variables directement mesurées sur les plantes (e.g. la croissance, le nombre de feuilles, la transpiration).

Analyser ces séries temporelles présente à la fois un enjeu scientifique pour le phénotypage et des défis techniques pour la recherche en informatique.

Nettoyage des données

Les données issues de la plateforme concernent des accessions qui sont chacune représentée par plusieurs plantes. Un premier problème lors de l’analyse de ces données consiste à nettoyer les données issues de plantes qui ont un comportement “déviant” (une plante parmi les 3 à 6 représentant cette accession et qui se comporte de manière anormale). Un premier ensemble d’outils permettrait à ce stade de mieux détecter ces données aberrantes. Il peut s’agir de mettre en place une distance entre les séries afin de détecter si l’une d’elles s’éloigne particulièrement du lot. Cette détection serait alors utilisée sous forme « d’alarme » par les experts afin de mieux cibler les données à examiner pour l’analyse à venir.

Analyse des données

Une fois nettoyées, les données des plantes (i.e. des individus) peuvent permettre d’obtenir des données caractérisant une accession, sous forme de généralisation. Autrement dit, à partir des séries temporelles de 3 à 6 plantes d’une accession, on peut obtenir une série unique (une sorte de série agrégée pour cette accession). Avec une série par accession, on peut alors produire un clustering de l’ensemble des séries temporelles associées à ces accessions.

Le travail de ce stage consiste en trois étapes principales :

  1. Etat de l’art. L’étudiant devra proposer un état de l’art sur l’analyse de séries temporelles. Cela devra couvrir les questions de discrétisation, régression et clustering.

  2. Application d’une technique de l’état de l’art (choisie en concertation avec les encadrants) sur un jeu de données réelles issues de la plateforme PhénoArch en ne considérant qu’une seule variable phénotypique (e.g. la croissance). Cette application devra être réalisée via une implémentation, par l’étudiant, de la technique sélectionnée, dans un des langages C/C++ ou Java.

  3. Proposition d’une méthode permettant de prendre en compte plusieurs variables dans le processus de clustering.

Permanent link to this article: https://team.inria.fr/zenith/clustering-de-series-temporelles-en-agronomie-regrouper-les-plantes-pour-mieux-les-etudier/