ConnectionLens: finding connections across heterogeneous data sources

ConnectionLens: Connecting the dots across heterogeneous data sources



ConnectionLens demonstrated for the visit of Florence Parly, French minister of Defense at Inria Saclay, in 2019

Data journalism projects often need to find connections across heterogeneous data sources, such as text documents and structured databases (whether relational, RDF, JSON etc.) Such connections may enable, for instance, to find all the private companies involving members of the Parliament or people closely associated to them, by exploiting on one hand, a set of text documents describing the involvement of people in contracts, and on the other hand, a structured database specifying the members of the parliament and possibly their relatives.

We have developed ConnectionLens, a tool for finding connection between user-specified search terms across heterogeneous data sources. ConnectionLens treats a set of heterogeneous, independently authored data sources as a single virtual graph, whereas nodes represent fine-granularity data items (relational tuples, attributes, key-value pairs, RDF, JSON or XML nodes…) and edges correspond either to structural connections (e.g., a tuple is in a database, an attribute is in a tuple, a JSON node has a parent…) or to similarity (sameAs) links. To further enrich the content journalists work with, we also apply entity extraction which enables to detect the people, organizations etc. mentioned in text, whether full-text or text snippets found e.g. in RDF or XML. An example is outlined below, where the red line traces a connection between “En Marche” (occurring in the data source at the left) and “AREVA” mentioned in the data source at the right.

ConnectionLens is thus capable of finding and exploiting connections present across heterogeneous data sources without requiring the user to specify any join predicate. This makes it comparable to existing systems capable of keyword search within structured databases; ConnectionLens goes beyond, as it is capable of handling heterogeneous data sources.


ConnectionLens was developed as part of the ContentCheck ANR project and through the WebClaimExplain Associated Team between Inria and AIST Japan.

Comments are closed.