Research – Inria Associate Team on Graph Querying and Analytics

Research Objectives

Our main objective is to combine the graph query languages expertise of the Inria group with the machine learning and graphs analytics expertise of the Chilean group to come up with a new generation of query languages that seamlessly integrate graph querying with analytics. We shall use as the initial basis the query language Cypher, developed in Neo4j (the graph database market leader; the language is currently implemented by many others including Amazon and SAP). The reason is that due to its leading role, Cypher serves as the basis for the newly developed standard for the Graph Query Language GQL; that standard however is not expected to be published before the very late 2023. We shall look into integrating data analytics tasks with Cypher and later with GQL. This involves the following key tasks.

Graph projection.: This is the term for outputting graphs as results of queries. Current graph query languages are graphs-to-relations; that is, query outputs are relations. This needs to be changed for graphs to be output and then used as inputs to analytical tasks. Existing solutions are extremely limited (e.g., Java libraries for creating lists of nodes and edges, without a possibility of combining them in any nontrivial way). Given the focus of the past research on graph-to-relational languages, a multitude of questions need to be reworked in the graph-to-graph setting. These include expressivity and complexity of languages, query optimization and evaluation strategies, updating catalogs, graph views, including the possibility of view updates.
Incorporating analytics tasks into a query language.: Current approaches are based on calling library functions, i.e. leaving the realm of a declarative language. The way to overcome this is to notice that most analytics tasks have tractable complexity, and thus can be expressed with the help of fixed-point computations. The database field knows very well how to add fixed points to relational languages; thus our task is to see how to do it for graph languages with graph projection in a way that achieves two goals: tractable complexity of query evaluation and the ease of writing queries expressing analytics tasks such as shortest paths, Page Rank, centrality measures and others.

The outcome of this will be a series of foundational studies on the key elements of graph projection and query language design with fixed-points, and a set of proposals on incorporating them into new versions of graph query languages, notably new releases of Cypher and future versions of GQL, to be achieved via our prominent representation on the GQL design committee.

Year 1 Achievements

Graph query languages: New graph query languages are developed by committees made up almost exclusively of industry participants. INRIA PI is one of very few exceptions, and using insider information, with several members of the team including Rogova in France, Vrgoč in Chile, produced a formalization of these languages in development, opening them up for investigation by the research community.
Graph Schemas: Unlike relational databases that come with a well defined schema, graph databases can range from no schema at all to a very tight but product specific schema. To provide a proper space of schema definition, an academia-industry group created a PG-Schema proposal for schemas for property graphs. Several members of the group – both in France and Chile – are members of that group.
Working with International Standards Committees. These activities, via the participation of the PI and efforts of LDBC’s Formal Semantics Working Group (including several team members) are being fed back to the standards committee, to inform them of academic advances and enhance the quality of new standards.