Virtually any decision, from voting to calling the doctor or buying stocks, is based on facts we find around us, and increasingly on the Internet. Facts are published by individuals (e.g. journalists, lobbyists, social users), organizations (e.g. public relations agencies, media outlets, governments), or machines (e.g. news generators, disaster monitors). The goal of this research is to create tools to find explanations for facts and verify claims made online. This raises several challenges:

  • Statements are generally made in natural language, which is notoriously hard to process algorithmically;
  • Even when statements are available in machine-processable form, determining the “truth”of claims is difficult because of the inherent lack of contextual information (time, space, political views, belief systems, etc.);
  • Assuming “sufficient” context is available, one still needs to use external sources and
    inference mechanisms to draw conclusions — if the trustworthiness of such sources and rules are subject to caution, this may lead to weak or simply wrong conclusions. In this respect, it is clear that the process cannot be fully automated. The main focus of our work will be explanation finding via trusted sources, based on the observation that one can only trust a statement if he/she can explain it through rules and proofs that can themselves be trusted.


Assessing the truth of a claim is not possible without taking context and interpretations into
account, which depend on where, when and from whom (resp. by whom) it was produced (resp.
consumed). While automated methods will not be able to settle in general, it can still play a
crucial role in helping to find explanations for it. In particular, such a program should aim at
finding a complete, relevant, trustworthy, and possible diverse, set of explanations for users to
make informed decisions.


From AIST :

From the team Cedar of Inria :

  • Maxime Buron (PhD student)
  • Tien Duc Cao (PhD student)
  • Felipe Cordeiro (intern, 2019)
  • Ludivine Duroyon (PhD student)
  • Lucas Elbert (M1 part-time intern, 2018)
  • Ioana Manolescu
  • Michaël Thomazo
  • Minh Huong Le Nguyen (M1 full-time intern, 2018)
  • Haris Sahovic (M1 part-time intern, 2018-2019)

Published Papers

  • A Content Management Perspective on Fact-Checking. The Web Conference 2018
    Sylvie Cazalens, Philippe Lamarre, Julien Leblay, Ioana Manolescu, Xavier Tannier.
  • ConnectionLens: Finding Connections Across Heterogeneous Data Sources. VLDB 2018, also accepted at BDA 2018
    Camille Chanial, Rédouane Dziri, Helena Galhardas,
    Julien Leblay, Minh-Huong Le Nguyen and Ioana Manolescu.
  • Computational fact-checking: a content management perspective. VLDB 2018
    Sylvie Cazalens, Julien Leblay, Ioana Manolescu, Philippe Lamarre, Xavier Tannier.
  • ConnectionLens: Finding Connections Across Heterogeneous Data Sources BDA 2018
    Camille Chanial, Rédouane Dziri, Helena Galhardas, Julien Leblay, Minh-Huong Le Nguyen and Ioana Manolescu.

Notable results

    • Camille Chanial receives the Prix de stage du Département d’Informatique for his ConnectionLens work supervised by Julien Leblay, Ioana Manolescu and Helena Galhardas (november 2018)
    • ConnectionLens was demonstrated in April 2019 to a delegation accompanying the French minister of Defense, Florence Parly, who visited Inria Saclay to announce the Defense AI initiative.



Meeting at Tokyo in June 2017


      • Ioana Manolescu: ANR ContentCheck project
        ANR is the French National Research Agency. I am currently coordinating the ContentCheck collaborative project with four academic partners (Inria, U. Rennes 1, U. Lyon 1, and LIMSI Lab of Université de Paris Sud) and the fact checking journalist team from Le Monde, France’s leading national newspaper. The project aims at identifying foundational models, algorithms and tools based on data, knowledge and text management, and to apply them to journalistic fact-checking. I will present the project’s main goals and results achieved so far. I will also share some of the knowledge we learned about this industry during our collaborative project, concerning their way of working, their tools, international collaboration networks and jointly built tools for journalistic fact-checking.
      • Pascual Martínez-Gómez: Ccg2lambda: Compositional semantics for claim interpretation
        Claims are natural language expressions that machines cannot readily understand. We need a human-machine interface that automatically parses claims stated in natural language into machine interpretable meaning representations. To this end, we have developed ccg2lambda, a system that composes symbolic meaning representations guided by syntactic derivations. Ccg2lambda is publicly available and it has been used to perform natural language inferences in English and Japanese to great success.
      • Dan Han: Natural language phrase disambiguation and potential applications to entity/predicate linking
        Language expressions are naturally ambiguous. Whereas the structural disambiguation is typically carried out by semantic parsers, lexical items and phrases still suffer from polysemy and surface divergence. In this talk I describe some approaches that we used to overcome phrasal ambiguity and map between different (but semantically equivalent) phrases to each other. I will also briefly describe potentially similar ideas to map between phrases and KB identifiers.
      • Adrien Rougny: Using abductive reasoning to complete biological networks
        In this talk, I will introduce abductive reasoning, which consists in finding minimal sets of hypotheses that explain a new observation given a background theory.
        I will then illustrate this mode of reasoning by showing how it can be used to complete biological networks from new experimental data.
      • Michaël Thomazo: Ontology-Based Query Answering [pdf]
      • Maxime Buron: Providing Context to Semantic Queries [pdf]
        We propose a approach to build synthetic and informative groups of answers of a query on a knowledge base. We want groups are the selected answers of related labels acting as filters on answers. At the end, we hope to use the labellisation to allow a user to explore the answer of the query.
      • Julien Leblay: Putting claims in perspective with BackDrop [pdf]
        Using the Web to assess the validity of claims presents many challenges. Whether the data comes from social networks or established media outlets, individual or institutional data publishers, one has to deal with scale and heterogeneity, as well as with incomplete, imprecise and sometimes outright false information. All of these are closely studied issues. Yet in many situations, the claims under scrutiny, and the data itself, have some inherent context-dependency making them impossible to completely disprove, or evaluate through a simple (e.g. scalar) measure. While data models used on the Web typically deal with universal knowledge, we believe the time has come to put context, such as time or provenance,at the forefront and watch knowledge through multiple lenses. Wepresent BackDrop, an application that enables annotating knowledge and ontologies found online to explore how the veracity of claims varies with context. BackDrop comes in the form of a Web interface, in which users can interactively populate and annotate knowledge bases, and explore under which circumstances certain claims are more or less credible.
      • Steven Lynden: Supporting Fact Checking Applications using Structured Open Web [pdf]
        Many fact-checking applications necessitate the extraction of claims from free text which are then converted into a machine readable representation. Although such a process is necessary for extracting claims, checking claims can be facilitated by readily available machine readable data sources, such as Wikidata, and sources of Linked Open Data. It has also been shown (e.g. [http://webdatacommons.org ]) that there has recently been an explosion in embedded structured data in web pages such using formats such as Microdata, JSON-LD. I will present an overview of the applicability of such data sources within the context of how they might be practically utilised in automated fact checking, and how such sources can be effectively retrieved and queried.
      • Ludivine Duroyon: A data model for temporal beliefs
        We create a data model for a journalistic context. We need to model the point of view of agents (politicians), this is their beliefs and we add the temporal aspect. We have some facts like “Who says/thinks what and when ?”. For that, we use RDF (resource description framework) database, that permit to use ontological constraints. With request, we hope we will able to say who change his mind in the last weeks or who disagree with whom about something.
      • Nicolas Schwind: Merging Qualitative Spatial and Temporal Constraint Networks
        Time and space representation is an important task in many domains such as natural language processing, geographic information systems (GIS), computer vision, robot navigation. Many qualitative approaches have been proposed to represent the spatial or temporal entities and their relations. The majority of these formalisms use qualitative constraints networks (QCNs) to represent information about a system. In some applications, e. g. multi-agent systems, spatial or temporal information come from different sources, i. e. each source provides a spatial or temporal QCN representing relative positions between objects. The multiplicity of sources providing spatial or temporal information makes that the underlying QCNs are generally conflicting. Indeed it becomes necessary to solve the conflicts and define a set of consistent spatial or temporal information representing the result of merging. I will present some merging process specified to QCNs, taking inspiration from works in propositional merging.

Meeting at Saclay in January 2018


    • Ioana, Julien: A Content Management Perspective on Fact-Checking [pdf]
      Fact checking has captured the attention of the media and the public alike; it has also recently received strong attention from the computer science community, in particular from data and knowledge management, natural language processing and information retrieval; we denote these together under the term “content management”. In this paper, we identify the fact checking tasks which can be performed with the help of content management technologies, and survey the recent research works in this area, before laying out some perspectives for the future. We hope our work will provide interested researchers, journalists and fact checkers with an entry point in the existing literature as well as help develop a roadmap for future research and development work.
    • Pascual Martínez-Gómez: Grounded Semantic Parsing of Claims and Questions [pdf]
      We present our NLP attempts to obtain symbolic meaning representations (MRs) of questions and claims and show how these MRs diverge from the desired SPARQL representations. Then we present a reformulation where we first recognize entities, relations and types using Named Entity Recognition techniques and induce a weighted regular tree grammar (wRTG) that describes the possible SPARQL representations of the claims/questions. We believe these wRTGs are useful since they describe all possible claim/question grounded interpretations in a compact manner and lend themselves to ontological operations that restrict the search space.
    • Tien-Duc Cao: Searching for truth in a database of statistics [pdf]
      To combat misinformation, fact-checking journalists typically check the accuracy of claims against some trusted data source. Statistic databases such as those compiled by state agencies or by reputed international organizations are often used as trusted data sources, for the valuable, high-quality information. However, the usability of such statistic databases is limited when they are shared in a format such as HTML or spreadsheet tables: this makes it hard to find the most relevant dataset for checking a specific claim, or to quickly extract from a dataset the best answer to a given query. We present an approach to facilitate the use of statistic tables in fact-checking, by (i) identifying the statistic datasets most relevant for a given fact-checking task, and (ii) extracting from each dataset the best specific (precise) answer it may contain for a given query. We have implemented our approach and experimented on the complete corpus of statistics obtained from INSEE, the French national statistic institute. Our experiments and comparisons demonstrate the effectiveness of our proposed method.
    • Steven Lynden: Maintaining the freshness of data retrieved from autonomous, distributed data sources
      A growing amount of open data sources are potentially of use in fact checking applications. Effectively utilizing such data requires a means of maintaining its freshness when changes occur at the source, or recognizing when data may no longer reflect real-world facts accurately. I will review some related work in the areas of Web crawling strategies to maintain the freshness of search engines and predicting change on the Web. I will then offer some ideas for applying such work with respect to our fact checking project to promote discussion on the subject.
    • Adrien, Julien, Michael, Nicolas: Principles of fact checking
    • Ludivine: Talk on a model for journalism data
    • Camille, Rédouane, Minh Huong: Keyword Search in Heterogeneous Dataspaces
      In this work, we set out to formalize and develop a data management tool to solve a crucial problem in data journalism: finding connections between a given set of terms, concepts and/or entities, based on a set of heterogeneous, independently produced data sources. This research builds upon and extends prior work on keyword search in relational, RDF of text corpora; we develop it based on a problem identified by the Décodeurs team of Le Monde, the leading French national newspaper.
    • Helena Galhardas: Approximate duplicate detection and elimination [pdf]
      Data journalism typically requires to integrate data coming from heterogeneous data sources (e.g., JSON, RDF graphs, relational databases, text). In order to check the veracity of facts, journalists need to find connections between entities that may belong to distinct data sources. Therefore, it is important to identify when two or more entities refer to the same real world entity (for example, “Emanuel Macron” and “E. Macron” refer to the President of France). The field of approximate duplicate detection studies the algorithms and techniques to find elements that refer to the same real-world entity.
      I will give an overview of the main challenges underlying approximate duplicate detection. Furthermore, I will briefly describe the main aspects of data fusion (also called approximate duplicate elimination) which is the step that follows the detection of approximate duplicates when one wants to have a single representation for the duplicates. Both activities are crucial in data cleaning.

Meeting at Lyon in April 2018

Ioana Manolescu and Julien Leblay met in Lyon in April 2018, when they gave together the tutorial on content management technologies for journalistic fact-checking.

Meeting at Tokyo in July 2018

Ioana Manolescu, Minh-Huong Le Nguyen and Helena Galhardas from the Technical University of Lisbon visited AIRC Tokyo, where Camille Chanial was working under the supervision of Julien Leblay (AIST Tokyo). We have worked on the ConnectionLens project, which was accepted as a demonstration at PVLDB 2018, and has been also demonstrated at the French national database conference BDA 2018.

Meeting at Saclay in February 2019

Julien Leblay visited Inria Saclay in February 2018. We discussed future developments on ConnectionLens, together with Lucas Elbert and Haris Sahovic; their project ended shortly after (early march 2019).

Meeting at Tokyo in May 2019

Ioana Manolescu and Felipe Cordeiro visited AIRC in May 2019. We worked on the ConnectionLens platform, integrating different developments made by Felipe and Julien.

Comments are closed.