Zenith seminar: Youcef Djenouri “Urban traffic outlier detection”, 14 Feb 2019

Youcef Djenouri will visit the team from Feb12 to Feb19 and he will work on Time Series analytics with us.

He will give a talk on Feb14 at 4pm in BAT5-02.022-JPN.

Title: “Urban traffic outlier detection”

Abstract:
In this talk, I present solutions to outlier detection approaches in urban traffic analysis. We divide existing solutions into two main categories: flow outlier detection and trajectory outlier detection. The first category groups solutions that detect flow outliers and includes statistical, similarity and pattern mining approaches. The second category contains solutions where the trajectory outliers are derived, including offline processing for trajectory outliers and online processing for sub-trajectory outliers. Solutions in each of these categories are described, illustrated, and discussed, and open perspectives and research trends are drawn. In this context, we can better understand the intuition, limitations, and benefits of the existing outlier urban traffic detection algorithms. As a result, practitioners can receive some guidance for selecting the most suitable methods for their particular case.

About Youcef Djenouri: 
YOUCEF DJENOURI received the Ph.D. degree in computer engineering from the University of Science and Technology Houari Boumediene, Algiers, Algeria, in 2014. From 2014 to 2015, he was a permanent Teacher-Researcher with the University of Blida, Algeria. He focused on BPM Project supported by Unist University, South Korea, in 2016. In 2017, he joined Southern Denmark University as a Postdoctoral Researcher, where he focused on urban traffic data analysis. He is now with the Norwegian University of Science and Technology, Trondheim, Norway, where he is granted funding from European Research Consortium on Informatics and Mathematics. He focuses on topics related to artificial intelligence and data mining, with focus on time series analysis, frequent pattern mining, parallel computing, and evolutionary algorithms. He has been granted short-term research visitor internships to many renown universities including ENSMEA, Poitiers; University of Poitiers; and University of Lorraine. He has published more than 50 published journal and conference papers, and two book chapters, and one tutorial paper. Some of his selected papers are published in good and top journals and conferences including ACM Computing Surveys, IEEE Intelligent Systems, IEEE Access, Information Sciences, ICDM or PAKDD.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-youcef-djenouri-urban-traffic-outlier-detection-14-feb-2019/

Thèse CIFRE Ina et Inria : “Apprentissage profond (Deep Learning) à large échelle pour la création de bases de connaissances et la valorisation d’archives”

Thèse CIFRE Ina et Inria : “Apprentissage profond (Deep Learning) à large échelle pour la création de bases de connaissances et la valorisation d’archives”

Sujet

L’accroissement du nombre de programmes audiovisuels à archiver impose de nouvelles contraintes de productivité sur la documentation. Le développement d’outils automatiques et semi-automatiques pour assister le travail des documentalistes est désormais indispensable pour exploiter au mieux la très grande quantité d’informations disponibles. Ces dernières années, sont ainsi apparues des techniques d’indexation et d’analyse de contenu visuel ou sonore, permettant la modélisation d’information de haut niveau, comme par exemple : des visages, des locuteurs, des monuments, des logos, des décors, des noms de chansons, etc. La modélisation consiste à construire des représentations visuelles des entités avec lesquelles on désire annoter des archives multimédias. Les processus de modélisation sont basés sur des méthodes d’apprentissage non-supervisées, supervisées, ou parfois pauvrement supervisées.

Avec l’essor des réseaux de neurones convolutionnels durant ces dernières années, les représentations visuelles ad-hoc (“hand-crafted”) sont progressivement remplacées par des représentations à base de Deep Learning apprises à partir de données d’apprentissage dédiées à la tâche d’annotation visée. Ces stratégies d’apprentissage supervisées allant du signal (pixels) jusqu’aux classes ou entités dans un même formalisme ont permis d’atteindre des performances très importantes pour la reconnaissance d’objets dans les images.

Ces méthodes ont toutefois deux limitations majeures pour envisager une exploitation dans le contexte de la documentation professionnelle à large échelle. Premièrement, elles fonctionnent en monde fermé c’est à-dire avec un nombre fixe de classes préalablement connues. Dans le cadre de l’Ina, il est essentiel de fonctionner en monde ouvert car à chaque instant :

  • des utilisateurs peuvent vouloir créer de nouvelles classes,
  • et le système de prédiction peut être sollicité pour des images n’appartenant pas à la base d’apprentissage, ce qui est essentiel à détecter.

Deuxièmement, à jour ces méthodes ne permettent être envisagées efficacement dans des processus d’apprentissage actif et incrémentaux du type bouclage de pertinence ou propagation d’annotation. Hors ces modes de fonctionnement dynamiques et interactifs sont indispensables à une mise en oeuvre métier. Il y au sein de l’Ina des dizaines de documentalistes qui ont pour mission d’annoter les documents vidéo. Il est essentiel que ces documentalistes puissent interagir avec le système de reconnaissance et que celui-ci soit suffisamment réactif.

Plus formellement, le coeur de la thèse sera de s’attaquer aux problèmes d’apprentissage actif multi-label et de détection de la nouveauté dans le contexte de l’apprentissage profond de représentations visuelles. Cela nécessitera de résoudre des verrous liés au passage à l’échelle des méthodes à base de modèles profonds.

Encadrement et contexte

L’encadrement de la thèse sera assuré par Alexis Joly (HDR, Inria, https://scholar.google.fr/citations?user=kbpkTGgAAAAJ&hl=fr&oi=ao)  et Olivier Buisson (Dr, Ina, https://scholar.google.fr/citations?user=rWunhTEAAAAJ&hl=fr). Elle s’inscrit dans la continuité de plus de 10 ans de collaboration. Deux thèses CIFRE ont notamment déjà été soutenues en 2013 et 2016 sous leur co-supervision.  Par ailleurs, une plateforme de R&D nommée Snoop a été co-développée. Celle-ci est en cours d’expérimentation au sein de l’Ina mais aussi utilisée pour l’application de reconnaissance des plantes PlantNet (http://identify.plantnet-project.org).

Les acteurs institutionnels de cette thèse, l’équipe Zénith de l’Inria et l’Ina ont une expérience solide dans l’analyse de données multimédia et le passage à l’échelle et apporteront des compétences complémentaires sur le sujet. Les travaux de Zenith s’articulent autour de la gestion, l’analyse et de la recherche d’informations dans des données hétérogènes de très grandes tailles. Au sein de l’Ina, le doctorant rejoindra le département de la Recherche et d’Innovation qui s’intéresse à tous les sujets de recherche en lien avec l’archivage audiovisuel.

Candidature

Envoyez par email et en PDF à l’adresse thcand@ina.fr, les documents suivants :

  • CV,
  • lettre de motivation ciblée sur le sujet,
  • au moins deux lettres de recommandation,
  • relevés de notes + liste des enseignements suivis en M2 et en M1.

 

Informations sur le poste

Début : courant 2019, dès l’acceptation du dossier Cifre par l’ANRT.

Salaire : 36 000€ bruts sur 13 mois.

Lieu : Ina (Institut national de l’audiovisuel) à Bry-sur-Marne.

 

Permanent link to this article: https://team.inria.fr/zenith/these-cifre-ina-et-inria-apprentissage-profond-deep-learning-a-large-echelle-pour-la-creation-de-bases-de-connaissances-et-la-valorisation-darchives/

Zenith seminar: Renan Souza “Providing Online Data Analytical Support for Humans in the Loop of Computational Science and Engineering Applications”, 15 jan. 2019

Zenith seminar: 15/01/19, 15h – BAT5-02.124

Providing Online Data Analytical Support for Humans in the Loop of Computational Science and Engineering Applications

Renan Souza (IBM Research Brazil and UFRJ, Rio de Janeiro)

Abstract.Computational Scientists and Engineers analyze complex and big data during the execution of long-lasting data processing workflows in parallel machines. Depending on the results, they may need to steer the workflows by adapting predefined input data or settings. Being able to analyze the resulting data online knowing that certain results may have been directly influenced by specific actions they took is of paramount importance for result interpretability, reuse, and reproducibility. However, three major challenges hinder such analysis: online analytical support, user steering tracking, and efficient performance. In this talk, I will focus on online analytical support particularly for problems that require integrated data analysis by multi-workflows. Multi-workflows are distributed and parallel workflows that process data in heterogeneous data stores (e.g., DBMSs with various data models or raw data files) and share data dependencies. Such heterogeneity makes online analytical support even more challenging. We propose a solution to capture workflow provenance and domain data online to provide an integrated view over the data stores. We explore a real case study composed of four workflows that preprocess data for a Deep Learning classifier for Oil and Gas exploration. We show that our solution allows users to run online integrated data analysis of the multi-workflow data. Also, for certain scenarios, the performance of our solution is two orders of magnitude faster than a state-of-the-art solution.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-renan-souza-15-jan-2019/

Post-doc position: A/B testing guided clustering

Amadeus ( https://amadeus.com/en ) and the Zenith team of Inria ( https://team.inria.fr/zenith/ ) are seeking a postdoctoral fellow in A/B testing, clustering and time series analytics.

Title: A/B testing guided clustering

Description:

The post-doc position takes place in a new partnership between Amadeus and Inria. It is linked to Amadeus’ developments in implementation of intelligent and evolving flight recommendation search means for online travel agencies (OTAs). The general principle is to choose recommendations by optimizing several criteria simultaneously (price, duration of the trip, number of stops, etc.). Each flight recommendation is associated with a score defined as a linear combination of criteria and weight. Weights therefore define how important each criterion is. To be able to adapt the importance of the criteria according to the profile of the user, user queries are segmented by means of unsupervised classification (or clustering). Weight values are optimized independently on each segment by maximizing the estimated reservation probability of returned flight recommendations. Thus, a set of weights is associated with each of the user profiles, called segments. During the weight creation process, large volumes of data are used, especially during the segmentation phase. The ability of the flight recommendation search system to increase the conversion rate is evaluated using A / B test campaigns.

The expected work in this postdoc position is comprised of two complementary topics:
1. optimizing the planning of A / B test campaigns,
2. developing incremental methods of adaptation of flight search segmentation from the results of A / B tests.

The objective of the first point is to improve the use of A / B tests in order to draw conclusions as quickly and as safely as possible, as well as to be able to know at each stage the uncertainty about the results of the A / B test.

The second topic is directly related to the first, since it is a question of using the results of A / B test obtained on each segment to improve the segmentation. The initial idea is to develop an incremental clustering algorithm in which phases of search segmentation and A / B test follow one another.

About Amadeus
Amadeus builds the critical solutions that help airlines and airports, hotels and railways, search engines, travel agencies, tour operators and other travel players to run their operations and improve the travel experience, billions of times a year, all over the world.

About Zenith
The Zenith project-team, headed by Patrick Valduriez, aims to propose new solutions related to scientific data and activities. Our research topics incorporate the management and analysis of massive and complex data, such as uncertain data, in highly distributed environments.

Skills and profile:

– Background in data mining / data analytics
– A Ph.D. in computer science or mathematics

Environment, salary, duration:

The postdoc will be supervised by Amadeus and Inria, while being located in the Amadeus facilities of Sophia Antipolis.

Net salary: up to 3300 Euros net/month depending on your experience.
Duration: 1 Year
Starting date: flexible but ideally as soon as possible.

Contact:

Nicolas Maillot ( nicolas.maillot@amadeus.com )
Florent Masseglia ( florent.masseglia@inria.fr )

 

Permanent link to this article: https://team.inria.fr/zenith/post-doc-position-a-b-testing-guided-clustering/

Zenith seminar: Eduardo Ogasawara “Comparing Motif Discovery Techniques with Sequence Mining in the Context of Space-Time Series”, 26 nov. 2018

Zenith seminar:  26/11/18, 14h – BAT5-02.249

Comparing Motif Discovery Techniques with Sequence Mining in the Context of Space-Time Series

Eduardo Ogasawara

CEFET / Rio de Janeiro)

Abstract:  A relevant area that is being explored in time series analysis community is finding patterns. Patterns are sub-sequences of time series that are related to some special properties or behaviors. A particular pattern that occurs a significant number of times in time series is denominated motif. Discovering motifs in time series data has been widely explored. Many time series techniques were developed to tackle this problem. However, various important time-series phenomena present different behaviors when observed at points of space (for example, series collected by sensors and IoT) and are better modeled as spatial-time series, in which each time series is associated to a position in space. When it comes to spatial-time series, it is possible to observe an open gap according to the literature review. Under such scenarios, motifs might not be discovered when we analyze each time series individually. They may be frequent if we consider different spatial-time series at some time interval. Finding patterns that are frequent in a constrained space and time, i.e., find spatial-time motifs, may enable us to comprehend how a phenomenon occurs concerning space and time. Meanwhile, database/data mining community studies the problem of discovering spatiotemporal sequential patterns, which appears in a broad range of applications. Many initiatives find sequences constrained by space and time, which can shed light to tackle spatial-time motif discovery. We are going to present these different techniques and potential challenges and solutions arising from these two communities in the context of spatial-time series motif discovery.

 

Short bio: Eduardo Ogasawara is a Professor of the Computer Science Department of the Federal Center for Technological Education of Rio de Janeiro (CEFET / RJ) since 2010. He holds a D.Sc. in Systems Engineering and Computer Science at COPPE / UFRJ. His background is in Databases, and his primary interest is Data Science. He is currently interested in data preprocessing, prediction, and pattern discovery regarding spatial-time series and also data-driven parallel and distributed processing. He is a member of the IEEE, ACM, INNS, and SBC. He led the creation of the Post-Graduate Program in Computer Science (PPCIC) of CEFET/RJ approved by CAPES in 2016. Currently, he is heading PPCIC.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-eduardo-ogasawara-comparing-motif-discovery-techniques-with-sequence-mining-in-the-context-of-space-time-series-26-nov-18/

Zenith seminar: Nicolas Anciaux “Personal Data Management Systems using Trusted Execution Environments” 21 nov. 2018

Zenith seminar
Wed. 21 nov. 2018, 10h30
Bat. 5, room 1.124
Personal Data Management Systems using Trusted Execution Environments
Nicolas Anciaux
Inria Saclay & UVSQ

Abstract : Thanks to smart disclosure initiatives and new regulations like GDPR, Personal Data Management Systems (PDMS) emerges. The PDMS paradigm empowers each individual with his complete digital environment. On the bright side, this opens the way to novel value-added services when crossing multiple sources of data of a given person or crossing the data of multiple people. Yet this paradigm shift towards user empowerment raises fundamental questions with regards to the appropriateness of the functionalities and the data management and protection techniques which are offered by existing solutions to laymen users. This presentation (1) compares PDMS alternatives in terms of functionalities and threat models, (2) derives a general set of functionality and security requirements that any PDMS should consider, (3) proposes a preliminary design building upon Trusted Execution Environments for an extensive and secure PDMS reference architecture, and (4) identifies a set of challenges of implementing such a PDMS.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-nicolas-anciaux-personal-data-management-systems-using-trusted-execution-environments-21-nov-2018/

IBC and Zenith Seminar: Daniel de Oliveira “Parameter and Data Recommendation in Scientific Workflows based on Provenance”, 5 June 2018

IBC seminar (WP5): 5/6/2018, room 1.124, 14h

Organized by Zenith

Parameter and Data Recommendation in Scientific Workflows based on Provenance
Daniel de Oliveira

Fluminense Federal University
Rio de Janeiro, Brazil

Abstract: A growing number of data- and compute-intensive experiments have been modeled as scientific workflows in the last years. Such experiments are commonly executed several types varying parameters and input data files since the comparing method plays an important role in scientific research. As the complexity of the experiments and the volume of input and intermediate data increase, scientists have to spend much time defining parameter values and data files to be processed in such experiments. This talk discusses the problem of identifying suitable parameter values and data files for an experiment and then recommending them for the scientist. We present a novel method to make recommendations for scientists. This method is based on data captured from previous executions of the workflow and machine learning algorithms. Our experiments show that, the recommended data files and parameters do a good job in helping scientists to execute workflow successfully.

Permanent link to this article: https://team.inria.fr/zenith/ibc-and-zenith-seminar-daniel-de-oliveira-parameter-and-data-recommendation-in-scientific-workflows-based-on-provenance-5-june-2018/

Zenith seminar: Patrick Valduriez “Blockchain 2.0: opportunities and risks” 19 oct 2018

Séminaire Zenith: vendredi 19 octobre 2018, 11h
BAT5-01.124

Blockchain 2.0: opportunities and risks
Patrick Valduriez
Zenith, Inria & LIRMM

Popularized by bitcoin and other digital currencies, the blockchain has the potential to revolutionize our economic and social systems.  Blockchain was invented for bitcoin to solve the double spending problem of previous digital currencies without the need of a trusted, central authority. The original blockchain is a public, distributed ledger that can record and share transactions among a number of computers in a secure and permanent way. It is a complex distributed database infrastructure, combining several technologies such as P2P, data replication, consensus protocols and cryptography.

The term Blockchain 2.0 refers to new applications of the blockchain to go beyond transactions and enable exchange of assets without powerful intermediaries. Examples of applications are smart contracts, persistent digital ids, intellectual property rights, blogging, voting, reputation, etc. Blockchain 2.0  could dramatically cut down transaction costs, by automating operations and removing intermediaries. It could allow people to monetize their own information and creators of intellectual property to be properly compensated. The potential impact on society is also huge, as excluded people could join the global economy, e.g. by having digital bank accounts for free.

In this talk, I will introduce Blockchain 2.0 technologies and applications, and discuss the opportunities and risks. In developing countries, for instance, the lack of existing infrastructure and regulation may be a chance to embrace the blockchain revolution and leapfrog traditional solutions. But there are also risks, related to regulation, security, privacy, or integration with existing practice, which must be well understood and addressed.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-patrick-valduriez-blockchain-2-0-opportunities-and-risks-19-oct-2018/

Zenith seminar: Mathieu Fontaine “Alpha-stable process for signal processing” 20 sept 2018

Séminaire Zenith: jeudi 20 septembre 2018, 11h
BAT5-01.124, Campus Saint Priest

Alpha-stable process for signal processing
Mathieu Fontaine
Zenith, Inria & LIRMM

The scientific topic of sound source separation (SSS) aims at decomposing audio signals into their constitutive components, e.g. separate the main singer voice from the background music or from the background noise. In the case of very old and degraded historical recordings, SSS strongly extends classical denoising methods by being able to account for complex signal or noise patterns and achieve efficient separation where traditional approaches fail.
Alpha-stable processes enjoy important mathematical challenges, efficient filtering applications and computational efficiency. This presentation targets at studying these models from a theoretical point of view, for the purpose of extending them in several directions : audio source localization, theoretical research in multichannel scenarios and restoring old historical recordings.

Permanent link to this article: https://team.inria.fr/zenith/zenith-seminar-mathieu-fontaine-alpha-stable-process-for-signal-processing-20-sept-2018/

IBC seminar: Dennis Shasha “Reducing Errors by Refusing to Guess (Occasionally)” 1 june 2018.

Séminaire IBC, organisé par  Zenith
Vendredi 1er juin 2018, 14h
Salle des séminaire, Bat. 4, LIRMM

SafePredict: Reducing Errors by Refusing to Guess (Occasionally)
Dennis Shasha
Courant Institute, New York University

We propose a meta-algorithm to reduce the error rate of state-of-the-art machine learning algorithms by refusing to make predictions in certain cases even when the underlying algorithms suggest predictions. Intuitively, our SafePredict approach estimates the likelihood that a prediction will be in error and when that likelihood is high, the approach refuses to go along with that prediction. Unlike other approaches, we can probabilistically guarantee an error rate on predictions we do make (denoted the {\em decisive predictions}). Empirically on seven diverse data sets from genomics, ecology, image-recognition, and gaming,, our method can probabilistically guarantee to reduce the error rate to 1/4 of what it is in the state-of-the-art machine learning algorithm at a cost of between 11% and 58% refusals. Competing state-of-the-art methods refuse at roughly twice the rate  of ours (sometimes refusing all suggested predictions).

Short bio

Dennis Shasha is a Julius Silver Professor of computer science at the Courant Institute of New York University and an Associate Director of NYU Wireless. He works on meta-algorithms for machine learning to achieve guaranteed correctness rates, with biologists on pattern discovery for network inference; with computational chemists on algorithms for protein design; with physicists and financial people on algorithms for time series; on clocked computation for DNA computing; and on  computational reproducibility. Other areas of interest include database tuning as well as tree and graph matching. Because he likes to type, he has written six books of puzzles about a mathematical detective named Dr. Ecco, a biography about great computer scientists, and a book about the future of computing. He has also written five technical books about database tuning, biological pattern recognition, time series, DNA computing, resampling statistics,  and causal inference in molecular networks. He has co-authored over eighty journal papers, seventy conference papers, and twenty-five patents. He has written the puzzle column for various publications including Scientific American, Dr. Dobb’s Journal, and the Communications of the ACM. He is a fellow of the ACM and an INRIA International Chair.

Permanent link to this article: https://team.inria.fr/zenith/ibc-seminar-dennis-shasha-reducing-errors-by-refusing-to-guess-occasionally-1-june-2018/