PathWays

PathWays: Exploring heterogeneous data graphs through their entity paths

PathWays is a system that is able to find connections between entities, such as people, organizations and locations, across documents. For this task, PathWays starts by loading a set of (potentially heterogeneous) datasets as a data graph (see https://arxiv.org/abs/2012.08830) and then builds a collection graph, i.e. a summary of it (see https://hal.inria.fr/hal-03767967). Based on that collection graph, PathWays enumerates all paths connecting two user-specified entities. 

 

Pathways main pipeline

 

Download

PathWays is a software developed in Java and using Postgres to store data that you can download at the following link:

https://gitlab.inria.fr/cedar/pathways

Publications

When referring to this work, please cite the article published in ADBIS 2023.

Demonstration

PathWays interface

The user starts by tuning PathWays, with the help on the form at left, as follows:

  • Which database to use?
  • Which types of entities to connect?
  • How many paths to enumerate?
  • What is the maximum allowed length for a path?

One can also read a database and show the result in the GUI (form at right).

PathWays user parameters (left) and PathWays database loading (right)

 

NASA dataset exploration

In the result below, we loaded the Nasa dataset and looked for how organizations and people are connected. We obtained 7 paths, each leading to some data paths in the data graph. The user can sort them by their length or their number of associated data paths. They can also hide paths leading to an empty set of data paths.

When clicking on a path, all associated data paths are shown in a tabular view where one line corresponds to one data path with highlighted connected entities. Then, the user can:

  • Order data paths based on a column (triangles in the header columns for ascending and descending order).
  • Filter data paths based on a string (text boxes in the header columns). The checkbox near the first and last text boxes indicates whether the searched text should be matched on the entity only or the whole text value.
  • Hide a data path that is not interesting (trash icon near the ID).

Tabular view of data paths for the path connecting organizations and people through spacecrafts

For example, let us hase a look on how D. A. Nichols relates to organizations. She is connected to Vandenberg AFB (Air Force Base) in 8 ways: she is mentioned in the description of 8 spacecrafts. All those spacecrafts were launched from Vandenberg AFB.

How is D. A. Nichols connected to Vandenberg AFB?

For a second example, let us have a look on how Thomas Stafford (https://en.wikipedia.org/wiki/Thomas_P._Stafford) is connected to organizations. On the first path, we obtain 3 references to Thomas Stafford, the three in the screenshot, connecting him to Edwards Air Force Base, Kodak Launch Complex and Vandenberg AFB. Other paths do not contain references to him. Note that, in this path, we connected Thomas Stafford to organizations by going through a value (column “#val”, namely “United States” for these data paths).

How is Thomas Stafford connected to organizations?

 

Comments are closed.