Ingénieur R&D projet Triton, November 11, 2014

Ingénieur R&D à fort potentiel, pour la conception d’un middleware pour réseaux sociaux d’entreprise

Beepeers, startup en forte croissance, a développé une plateforme pour aider ses clients (entreprises, collectivités et organisations diverses) à développer des réseaux sociaux et des applications sur smartphones, tablettes et ordinateurs (

Inria est l’institut français dédié aux sciences et technologies du numérique (

La société et l’institut de recherche se sont rapprochés pour créer « Triton », un Inria Innovation Lab afin de préparer les futures évolutions et le déploiement à grande échelle de la plateforme technologique de Beepeers, et renforcer le fort développement de l’entreprise.

Plus précisément l’objectif de ce lab sera de réaliser un middleware modulaire, flexible et dynamique pour des réseaux sociaux d’entreprise qui facilitera le passage à l’échelle, l’ajout de nouveaux services et un déploiement automatique des diverses solutions de Beepeers sur des clouds. Cette nouvelle architecture sera fortement basée sur les dernières avancées technologies suivantes :

  • base de données NoSQL (base de données orientée graphe) ;
  • architecture orientée services (Spring, Osgi, RESTful) ;
  • Cloud Computing pour le déploiement ;
  • Big Data pour la partie analyse/extraction (Hadoop).

Au sein du Lab Triton, l’ingénieur sera amené à :

  • adapter aux contraintes du projet de R&D  ces nouvelles technologies, en particulier les base de données NoSQL et les architectures orientée services ;
  •  de mettre en place des mécanismes d’architecture décentralisés, permettant le passage à l’échelle des solutions proposées ;
  • concevoir pour les besoins spécifiques des solutions de Beepeers, des algorithmes efficaces de propagation, de diffusion, d’échange d’informations et d’extraction d’informations ;
  • de permettre l’accès à des services métiers ou techniques localisés sur d’autres sites.

Profil recherché

  • Ingénieur (Master 2) avec 2 ou 3 ans d’expérience ou titulaire d’une thèse dans le domaine ;
  • Avoir 2 à 3 trois ans d’expérience en développement d’architecture logiciel en Java, à base de composants, notamment Spring ;
  • Avoir une expérience dans la gestion de bases de données de type NoSQL (Hbase, MongoDB, Cassandra…)  ;
  • Etre autonome et proactif ; savoir travailler en équipe et en mode projet.

Au delà du Lab, la mission confiée à l’ingénieur pourrait déboucher sur un poste clef dans la société.

Dossier de candidature

Votre curriculum vitae et une lettre de motivation devront être adressés à :

Didier Parigot, Inria – Senior Researcher : ;

Patrice Prez, Inria – Head Tech Transfer Office @ Sophia :

Alain Prette, Beepeers – CEO :

Zenith seminar “A Distributed Collaborative Filtering Algorithm With Multiple and Heterogeneous Data Sources”, Mohamed Reda Bouadjenek, October 10, 2014

collaborativeReda will present his recent work in Distributed Collaborative Filtering on Friday 10 Oct at 3:30pm (room to be defined). A Distributed Collaborative Filtering Algorithm With Multiple and Heterogeneous Data Sources. Recommender systems are used as a mean to supply users with content that may be of interest to them. They have attracted the attention of the research community, and have become a popular research topic, where many aspects and dimensions have been studied to make them more accurate and effective (this includes the: social dimension, geographical dimension, diversification aspect, etc.). Collaborative filtering (CF) is certainly one of the most famous recommendation methods, which consists in predicting whether, or how much, a user will like (or dislike) an item by leveraging knowledge of that user’s preferences as well as those of other users. However, in practice, users interact and express their opinion on only a small subset of items, which makes the corresponding user-item rating matrix very sparse. Consequently, in a recommender system, this data sparsity induced mainly two problems: (1) the lack of data to effectively model users’ preferences (news users suffer from the cold-start problem), and similarly (2) the lack of data to effectively model items’ preferences (new items suffer from the cold-start problem since no user has rated them). However, on the other hand, users use many online services, which can provide information about their interest and the content of items (e.g. Google search engine, Facebook, Twitter, etc). These services may be valuable data sources, which supply information to help a recommender system in modeling users and items’ preferences, and thus, make the recommender system more precise. Moreover, these data sources are distributed, and geographically distant from each other, which raise many research problems and challenges to design a distributed recommendation algorithm. Hence, in this talk, we present a new distributed collaborative filtering algorithm, which exploits and combine these multiple and heterogeneous data sources to improve the recommendation quality. Short bio: Reda Bouadjenek received a master and a PhD degree in computer science from the University of Versailles, France, in 2009 and 2013 respectively. He is currently a postdoctoral researcher at INRIA, and works on recommender systems. Previously, he worked for Alcatel-Lucent Bell Labs France from 2010 to 2013 as researcher, then was a visitor researcher at NICTA&ANU, Australia, in 2013. His research interests include Information Retrieval, Social Network Analysis, Data Mining, Machine Learning, Recommender Systems, and Databases.

The third edition of Principles of Distributed Database Systems now released in Chinese.


The third edition of Özsu-Valduriez’s Principles of Distributed Database Systems

(Springer 2011) has now been released in Chinese.

Translation by Prof. Li-Zhu Zhou published by Tsinghua University Press.

Patrick Valduriez est lauréat du Prix de l’innovation Inria – Académie des sciences – Dassault systèmes 2014

Valduriez 230714 - copie cadréeInstaurés en 2011, les Prix Inria ont pour vocation de promouvoir les contributions et succès de celles et ceux qui font avancer les sciences informatiques et mathématiques, qui participent ainsi au développement de notre monde numérique.

Inria annonce les lauréats des Prix Inria 2014.

Thèse Cifre “Conception d’une architecture innovante, ouvert, extensible et agile pour des réseaux sociaux d’entreprise” 7 juillet 2014.


Conception d’une architecture innovante, ouverte, extensible et agile pour des réseaux sociaux d’entreprise.

Thése Cifre avec la société Beepeers

Société : Beepeers ( dont l’activité est la création d’une plateforme collaborative d’outils sociaux pour les entreprises.

Lieu de travail : Sophia Antipolis

Directeur de Thèse : Didier Parigot

Equipe-Projet : Zenith


Depuis quelques années les thématiques de gestion de grand volume de donnée (BIG DATA) et des données ouvertes (OPEN DATA) prennent une importance grandissante avec l’essor des réseaux sociaux et de l’internet.  En effet par une exploitation ou une analyse des données manipulées il est possible d’extraire de nouvelles informations pertinentes qui permettent de proposer de nouveaux services ou outils. Mais pour un passage à l’échelle et une souplesse d’utilisation il est vital de concevoir une architecture logicielle innovante basée sur les nouvelles technologies du Big Data et du Cloud computing (SaaS). Dans le cadre d’une collaboration entre notre Equipe-Projet Zenith et une très jeune startup Beepeers qui commercialise une plateforme pour le développement de réseaux sociaux sectoriel, nous proposons ce sujet de recherche afin de concevoir une architecture innovante de cette plate-forme pour automatiser le plus possible les divers instanciation de la solution Beepeer sur le Cloud et de facilité la mise en place de nouveaux services avancés basés sur l’extraction ou l’analyse des données produites par ces réseaux sociaux d’entreprise.

Objectif de la thèse

L’objectif de la thèse sera de proposer une architecture innovant afin d’une part d’instancier rapidement les diverses instance de la solution Beepeers dans divers solution Cloud en fonction des fonctionnalités requises et d’autre part de permettre la mise en place d’outils d’extraction et d’analyse des donnée internes aux réseaux et aussi issus d’autre source de donnée, externe au réseaux. La plate-forme Beepeers propose déjà un riche ensemble de fonctionnalité ou services qui formera une excellente basse initiale pour ces futurs travaux de recherche.

Le doctorant devra proposer dans ce cadre applicatif bien ciblé, une architecture innovant qui devra combiner et permettre une mise en œuvre aisée des techniques suivantes :

  • d’analyse de données et extraction d’information ;
  • de propagation ou de diffusion d’information à travers le réseau ou entre différents réseaux sociaux connectés à la plate-forme Beepeers ;
  • de recommandation de personne, de service ou d’évènement à l’aide des avis des utilisateurs du réseau (fonctionnalité déjà disponible dans la plate-forme Beepeers) ;
  • d’extraction par requête base de donnée continu dans le temps (persistant) sur les sites de données ouvertes disponible et pertinentes pour le réseau sectoriel sous-jacent.

Il sera demandé une mise en œuvre originale basée sur

  •  une architecture décentralisée orientée services pour permettre un passage à l’échelle des solutions ;
  •  les bases de donnée orienté métiers comme Cassandra ou MongoDB pour une gestion de grand volume de données ;
  • un déploiement dynamique à la demande des services avancés dans le Cloud.

Contexte de la collaboration

Cette collaboration fait déjà l’objet d’un partenariat fort INRIA-PME à travers la mise en place et le démarrage cette année d’un laboratoire commun (I-lab), dénommé Triton, avec comme programme de R&D l’élaboration d’une architecture innovante pour la plate-forme Beepeers pour le passage à l’échelle. Ce programme de R&D va s’appuyer sur notre expertise en architecture décentralisée orientée services à travers l’utilisation de notre outil SON (Shared Overlay Network). Le doctorant sera donc accompagné dans ses propositions par cette équipe de R&D de ce  laboratoire commun Triton et pourra tester et valider ses propositions pour cette  nouvelle plate-forme Beepeers développé dans le cadre de l’I-Lab Triton. De plus le doctorant pourra s’appuyer sur l’expertise scientifique de l’équipe-projet Zenith en terme  gestion de données scientifiques.

Résultats attendus et profil attendus du candidat

Le candidat devra avoir un gout prononcé par la validation pratique de ses travaux de recherche, et des bonnes aptitudes d’abstraction pour savoir maitriser et appréhender rapidement ces différentes techniques d’analyse ou d’extraction de donnée issu de divers communautés scientifiques (base de donné, analyse d’usage et la programmation distribuée pour la mise en  œuvre). Le candidat devra savoir travailler en équipe, en étroite collaboration avec la société Beepeers pour mener à bien ses travaux de recherche.  Ces travaux devront trouver rapidement des champs d’application à travers la réalisation concrète et effective de nouveaux services de la plate-forme Beepeers.

Profil recherché

  • Ecole d’Ingénieur (BAC + 5) ou Master 2 ;
  • Expérience professionnelle ou institutionnelle souhaitée ;
  • Domaine :
  • Développement d’architecture logicielle en JAVA  à base de composant (ex : Spring)
  • Bases de Données de type NoSQL (HBase, MongoDB, Cassandra)
  • Goût du travail en Équipe
  • Bon niveau en Anglais

Pour postuler (voir les modalités d’une thèse cifre )

Merci de transmettre votre curriculum vitae, lettre de recommandation et une lettre de motivation à Didier Parigot le plus rapidement possible

Post-doctoral position on Massive Data Analytics

Big_DataPost-doctoral position available at Inria.

Title: Massive Data Analytics

Location: Montpellier, south of France.

Duration: 1 year (starting in september 2014)

Keywords: data analytics, large scale distribution, knowledge discovery, pattern mining.

Description: The Inria’s Zenith team (, directed by P. Valduriez, proposes a postdoctoral research position on massive data analytics. In the context of massive data distribution at very large scale, we must address major challenges to develop efficient solutions for analyzing the data. Actually, technological solutions exist to support developers in this task, e.g. Apache Spark or the MapReduce framework. However, there are still crucial problems to resolve in order to avoid dramatical response times. For example, in the case of pattern extraction, it is vital to design extraction schemes that take into account the context of distribution and characteristics of the infrastructure (typically a straightforward implementation of Apriori in MapReduce for frequent pattern discovery is easy, but will lead to very low performance). The analytical techniques considered in this postdoctoral position relate frequent patterns, frequent sequential patterns or informative patterns (based on entropy). According to your background, you will work on one or more of these topics, in a large scale distributed environment.

Salary: to be negotiated according to your experience.

Application: The candidate should have a strong background in large scale data management and be proficient in English. Send us a detailed CV, including a complete bibliography and recommendation letters.

Contacts: Florent Masseglia (, Reza Akbarinia (, Patrick Valduriez (

More information about the team:

Best presentation award for Miguel during the Grid5000 Spring School 2014 in Lyon.

g5kmapMiguel Liroz-Gistau has received the best presentation award from the Grid5000 Spring School 2014 in Lyon for his talk on “Using Grid5000 for MapReduce Experiments” (Miguel Liroz-Gistau, Reza Akbarinia, and Patrick Valduriez).

Abstract of the talk:
MapReduce is one of the most popular solutions for big data processing. In our recent research activities, we have improved the MapReduce framework by enhancing data locality and load balancing during the MapReduce job executions. Particularly, we developed two prototypes: 1) MRPart for reducing the data transfer between map and reduce nodes; 2) FP-Hadoop for bringing more parallelism to the framework and balancing the load of reduce nodes. We used Grid5000 for evaluating the performance of our solutions. In this paper, we describe our methodology for deploying and testing the developed prototypes in Grid5000.

Mastodons International Workshop on “Big Data Management and Crowd Sourcing towards Scientific Data”, June 30, 2014

mastodonsMonday 30th june 2014, in MONTPELLIER, 95 rue de la Galéra

IBC & LIRMM (UM2, CNRS-Mastodons), INRIA-UCSB associated team Bigdatanet

Organisation : Esther Pacitti Lirmm reception desk : +33 (0)4 67 41 85 85

Workshop Objective

In the context of the Mastodons project in Montpellier, we are addressing problems related to the management and analysis of big scientific data, in particular biology data such as those produced by next generation sequencing tools and plant phenotyping platforms. The objective of the workshop is to discuss emerging solutions for big data management with world-class scientists. More information on Mastodons Web Site.

Séminaire du pôle données connaissances : “In-Memory Analytics: Accelerating Business Performance” – QuartetFS, 23 juin 2014 à 11h.

future_chipSalle Galera 127 le 23-06-2014, 11h
Organisé par l’équipe Zenith

In-Memory Analytics: Accelerating Business Performance
Antoine Chambille, Romain Colle
QuartetFS, Paris

The Big Data trend is a rebirth for Business Intelligence. On the one hand the web companies use technologies like Hadoop to extract value from data previously out of reach, because it is too big or because it is not structured. On the other side the new databases that store the data in-memory reach levels of performance such that they can perform complex and interactive analysis on live data that changes in real-time.

In concrete terms those In-Memory databases are the foundation for a new generation of business applications that bring the power of analytics to the hands of the decision makers who “run the business”.

Through this crash course on In-Memory technology, we will see through practical examples the competitive advantage it already brings to the best-performing organizations in the fields of e-commerce, logistics and finance.

About the speakers

Antoine Chambille is Head of Research and Development at Quartet FS. He joined Quartet FS soon after its creation back in 2005 and has been leading the team in charge of designing, developing and supporting Quartet FS’s in-memory analytics solutions. As one of the first employees, Antoine was heavily involved in the design of ActivePivot Server, Quartet FS’ in-memory OLAP engine.
Before joining Quartet FS, Antoine worked several years for a consulting firm specialised in the financial sector. From his years in consulting, he developed a strong customer orientation and he is keen on keeping a close eye on customers’ use cases. Antoine graduated from Ecole Polytechnique and Telecom Paris.

Romain Colle is a Project Manager within the Research and Development team of Quartet FS. He designed and developed Sentinel, QuartetFS’ flagship monitoring solution, and led the development efforts on ActivePivot’s Distributed Architecture. Romain has been involved in large ActivePivot projects such as Societe Generale in Paris, J.P.Morgan in London and DekaBank in Frankfurt. He joined Quartet FS in 2010 after 3 years spent at Oracle’s Headquarters in Redwood Shores where he contributed to the development of their core database. Romain is graduated from the Stanford University (he holds a Master of Science in Computer Science) and from “Centrale” in Paris”.

About Quartet FS and ActivePivot

Quartet FS provides business users with instant insight into massive amounts of data streaming at high frequency for timely and context-aware decision-making. Using Quartet FS’ in-memory aggregation engine ActivePivot, organisations are able to build 24×7, sense-and-respond applications that help them accelerate business performance, optimize operations, reduce operational risk and react to the unexpected.

Created in 2005, Quartet FS is a privately owned company with offices in Paris, London, New York, Hong Kong and Singapore. With more than fifty live implementations in large international groups, the company serves customers operating in time-sensitive and data-intensive environments such as financial services, market exchanges, logistics, transportation, and retail.

ActivePivot allows business users to be able to extract actionable intelligence from massive amounts of fast moving data – enabling them to make informed decisions on the spot. An in-memory Hybrid Transactional and Analytics Processing engine, ActivePivot aggregates data from multiple sources and processes multi-dimensional queries at unparalleled speeds on data that is updated on the fly.
As a result, business users are able to:

  • Focus on what really matters by pinpointing anomalies at an early stage with ActivePivot detecting changes in business conditions and pushing meaningful alerts
  • Get answers in sub-second time to any analytical enquiry with ActivePivot processing millions of live records across systems using its in-memory aggregation capabilities
  • Drill-down and view data from any angle, at any point in the past, with ActivePivot providing the required contextual information for root cause analysis.
  • Run ‘What-if’ analysis and evaluate the effect of alternative scenarios on the business

Zenith seminar: “CLyDE Mid-Flight: What we have learnt so far about the SSD-Based IO Stack”, by Philippe Bonnet (Univ. of Copenhagen), May 28, 2014

ssd_stackZenith Seminar Room Galera 127 on May  28, 2014, 11am. CLyDE Mid-Flight: What we have learnt so far about the SSD-Based IO Stack  Philippe Bonnet, INRIA and IT University of Copenhagen Abstract: The quest for energy proportional systems and the growing performance gap between processors and magnetic disks has led to the adoption of SSDs as secondary storage of choice for a large range of systems.  Indeed, SSDs offer great performance (tens of flash chips wired in parallel can deliver hundreds of thousands accesses per second) with low energy consumption. This evolution introduces a mismatch between the simple disk model that underlies the design of today’s database systems and the complex SSDs of today’s computers. This mismatch leads to unpredictable performance, with orders of magnitude slowdown in IO latency that can hit an application anytime. To attack this problem, the obvious approach is to construct models that capture SSDs’ performance behaviour. However, our previous work has shown the limits of this approach because (a) performance characteristics and energy profiles vary significantly across SSDs, and (b) performance varies in time on a single device based on the history of accesses. The CLyDe project is based on the insight that the strict layering that has been so successful for designing database systems on top of magnetic disks is no longer applicable to SSDs. In other words, our central hypothesis is that the complexity of flash devices cannot be abstracted away as it results in unpredictable and suboptimal performance. We postulate that database system designers need a clear and stable distinction between efficient and inefficient patterns of access to secondary storage, so that they can adapt space allocation strategies, data representation or query processing algorithms. We propose that (i) SSDs should expose this distinction instead of aggressively mitigating the impact of inefficient patterns at the expense of the efficient ones, and (ii) that operating system and database system should explicitly provide mechanisms to ensure that efficient access patterns are favoured.  We thus advocate a co-design of SSD controllers, operating system and database system with appropriate cross-layer optimisations. In this talk, I will report on the lessons we have learnt so far in the project. In particular, I will describe the SSD simulation frameworks that we have developed to explore cross layer designs: EagleTree and LightNVM. I will discuss our findings on the importance of scheduling within an SSD. I will present our contribution to the re-design of the Linux block layer, that makes it possible for Linux to keep up with SSD performance on multi-socket systems. Finally, I will present preliminary results on the co-design of file system and SSDs.   CLyDE is a joint project between IT University of Copenhagen and INRIA Paris Rocquencourt, started in 2012 and funded by the Danish Council for Independent Research.   Bio: Philippe Bonnet is associate professor at IT University of Copenhagen. Philippe is an experimental computer scientist focused on building/tuning systems for performance and energy efficiency. Philippe’s research interests include database tuning, flash-based database systems, secure personal data management, sensor data engineering.

