Nov 03

Post-doc position: Similarity Search in Large Scale Time Series

Title: Similarity Search in Large Scale Time Series

We are seeking a postdoctoral fellow in time series analytics, in collaboration with Safran ( https://www.safran-group.com/ ).

Topic

Nowadays, sensors technology is improving, and the number of sensors used for collecting information from different environments is increasing, e.g., from critical systems such as airplane engines. This huge utilization of sensors results in the production of large scale data, usually in the form of time series. With such complex and massive sets of time series, fast and accurate similarity search is a key to perform many data mining tasks like Shapelets, Motifs Discovery, Classification or Clustering.

This PostDoc position is proposed in the context of collaboration between the INRIA Zenith team and Safran (a multinational company specialized in the aircraft and rocket engines). We are interested in the correlation detection over multi dimensional time series, e.g. generated by engine check tests. For instance given a time slice (generated using a set of input parameters) of a very large time series, the objective is to detect quickly the time slice that is the most similar to it, and by this to find the input parameter values that generate similar outputs.

One of the distinguishing features of our underlying application is the huge volume of data to be analyzed. To deal with such a dataset, we intend to develop scalable solutions that take advantage of parallel frameworks (such as Mapreduce, Spark or Flink) that allow us to make efficient parallel data mining systems over ordinary machines. We will capitalize on our recent projects where we developed parallel solutions for indexing and analyzing very large datasets, e.g. [YAMP2017, SAM2017, SAM2015, AHMP2015].

One possibility for scalable correlation detection in this project is to build on top of related work, including the matrix profile index [YZUB+2016] over time series generated by thousands of sensors.  One of the tasks, in the context of this project, will be to develop distributed solutions for constructing and exploiting such  indexes over large scale time series coming from massively distributed sensors.

[YAMP2017] Djamel-Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Themis Palpanas. DPiSAX: Massively Distributed Partitioned iSAX. IEEE International Conference on Data Mining (ICDM),  2017.

[SAM2017] Saber Salah, Reza Akbarinia, Florent Masseglia. Data placement in massively distributed environments for fast parallel mining of frequent itemsets. Knowledge and Information Systems (KAIS), 53(1), 207-237, 2017.

[SAM2015] Saber Salah, Reza Akbarinia, Florent Masseglia, Fast Parallel Mining of Maximally Informative k-Itemsets in Big Data. IEEE International Conference on Data Mining (ICDM), 2015.

[AHMP2015] Tristan Allard, Georges Hébrail, Florent Masseglia, Esther Pacitti. Chiaroscuro: Transparency and Privacy for Massive Personal Time-Series Clustering.  ACM Conference on Management of Data (SIGMOD), pp. 779-794, 2015.

[YZUB+2016] C-C M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. Anh Dau, D. Furtado Silva, A. Mueen, E. Keogh. Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets. IEEE International Conference on Data Mining (ICDM), 2016.

Environment

This work will be done in the context of collaboration between INRIA Zenith team and Safran. The Zenith project-team ( https://team.inria.fr/zenith/ ), headed by Patrick Valduriez, aims to propose new solutions related to scientific data and activities. Our research topics incorporate the management and analysis of massive and complex data, such as uncertain data, in highly distributed environments. Our team is located in Montpellier that is a very active town located in south of France.

Safran ( https://www.safran-group.com/ ; https://en.wikipedia.org/wiki/Safran ) is a multinational company specialized in the aircraft/ rocket engines and aerospace component manufacturing.

Skills and profiles

Strong background in data mining

Strong skill of parallel data processing in Spark

A Ph.D. in computer science or mathematics

Duration, Salary and Location

Duration: 12 months

Annual Gross Salary: up to 42K€ depending on your experience.

Starting date: flexible but ideally as soon as possible.

This work will be done mainly in Montpellier, with regular visits to the Safran team in Paris.

Contact

Florent Masseglia (florent.masseglia@inria.fr) and Reza Akbarinia (reza.akbarinia@inria.fr).

Permanent link to this article: https://team.inria.fr/zenith/similarity-search-in-large-scale-time-series/