ConnectionLens: graph integration of structured, semistructured and unstructured data

ConnectionLens: graph integration of structured, semistructured and unstructured data

Data-intensive applications need to work with heterogeneous data sources, which can be structured (e.g., relational or CSV), semi-structured (e.g., JSON, XML or RDF), or unstructured (e.g., text or PDF).  We have developed ConnectionLens, a for integrating heterogeneous, independently authored data sources in a single graph. It is particularly suited workloads that explore connections across  the data sources, across different data formats and different granularities, such as data journalism projects. To discover connections across data sources and enhance their value for the user, ConnectionLens leverages Information Extraction (Named Entity Recognition) and Named Entity Disambiguation.  Further, ConnectionLens allows querying the integrated graph by means of flexible keyword queries.
ConnectionLens is developed as part of the ANR/DGA AI Chair SourcesSay  and benefits also from the suppport of the national “Plan IA” and of the DIM RFSI program. We explore applications in collaboration with Le Monde and WeDoData.

Download

You can find the system here: https://gitlab.inria.fr/cedar/connectionlens

Publications

  • (Reference publication)Graph integration of structured, semistructured and unstructured data for data journalism” by Angelos-Christos Anadiotis, Oana Balalau, Catarina Conceicao, Helena Galhardas, Mhd Yamen Haddad, Ioana Manolescu, Tayeb Merabti, Jingmao You. In Elsevier Journal of Information Systems, 104:101846, 2022

    This article provides a complete description of the vision, the system architecture, and an experimental assessment as of early 2021.

  • (Application paper)Empowering Investigative Journalism with Graph-based Heterogeneous Data Management“, by Angelos-Christos Anadiotis, Oana Balalau, Theo Bouganim, Francesco Chimienti, Helena Galhardas, Mhd Yamen Haddad, Stephane Horel, Ioana Manolescu, Youssr Youssef. Accepted for publication in a special issue if the IEEE Data Engineering Bulletin, 2021.
    Here we describe an application of ConnectionLens to the detection of conflicts of interest in the biomedical domain. To scale up its search, we also describe a novel, in-memory, parallel query answering engine.
Invited talks
Conference and journal papers

Comments are closed.