Research

You can use our plugin to insert parts from your activity report (raweb)service.

Presentation

Example : tyrex

Overall objectives

Objectives

We develop the foundations for the next generation of information extraction, data analysis and neuro-symbolic programming systems. Our research extends ideas from data management, artificial intelligence, programming languages and logic.

Extracting value from data increasingly requires sophisticated algorithms to represent, query, process, analyze and interpret data. We develop the foundations of data processing systems and neuro-symbolic programming, with a focus on extracting information from graph structures. These graph structures are obtained from raw data that may be more or less structured, noisy, uncertain or incomplete. Challenges include robust, efficient and scalable processing of large graphs obtained from such data. We study and build new information extraction methods, as well as new robust and scalable programming methods for rich graph data structures.

Last activity report : 2024

2024 : PDF – HTML
2023 : PDF – HTML
2022 : PDF – HTML
2021 : PDF – HTML
2020 : PDF – HTML
2019 : PDF – HTML
2018 : PDF – HTML
2017 : PDF – HTML
2016 : PDF – HTML
2015 : PDF – HTML

Results

New results

Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers

Query optimizers built on the transformation-based Volcano/Cascades framework are used in many database systems. Transformations proposed earlier on the logical query dag (LQDAG) data structure, which is key in such a framework, focus only on recursion-free queries. We propose the recursive logical query dag (RLQDAG) which extends the LQDAG with the ability to capture and transform recursive queries, leveraging recent developments in recursive relational algebra. Specifically, this extension includes: (i) the ability of capturing and transforming sets of recursive relational terms thanks to (ii) annotated equivalence nodes used for guiding transformations that are more complex in the presence of recursion; and (iii) RLQDAG rewrite rules that transform sets of subterms in a grouped manner, instead of transforming individual terms in a sequential manner; and that (iv) incrementally update the necessary annotations. Core concepts of the RLQDAG are formalized using a syntax and formal semantics with a particular focus on subterm sharing and recursion. The result is a clean generalization of the LQDAG transformation-based approach, enabling more efficient explorations of plan spaces for recursive queries 4. An implementation of the proposed approach shows significant performance gains compared to the state-of-the-art 5 [6.1.1].

Schema-Based Query Optimisation for Graph Databases

Recursive graph queries are increasingly popular for extracting information from interconnected data found in various domains such as social networks, life sciences, and business analytics. Graph data often come with schema information that describe how nodes and edges are organized. We propose a type inference mechanism that enriches recursive graph queries with relevant structural information contained in a graph schema. We show that this schema information can be useful in order to improve the performance when evaluating acylic recursive graph queries. Furthermore, we prove that the proposed method is sound and complete, ensuring that the semantics of the query is preserved during the schema-enrichment process 6. Experimental results with a complete implementation of the approach show very drastic performance gains for query evaluations over property graphs 6 [6.1.1].

Efficient Iterative Programs with Distributed Data Collections

Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations 3.

Reproduce, Replicate, Reevaluate. The Long but Safe Way to Extend Machine Learning Methods

Reproducibility is a desirable property of scientific research. On the one hand, it increases confidence in results. On the other hand, reproducible results can be extended on a solid basis. In rapidly developing fields such as machine learning, the latter is particularly important to ensure the reliability of research. We present a systematic approach to reproducing (using the available implementation), replicating (using an alternative implementation) and reevaluating (using different datasets) state-of-the-art experiments. This approach enables the early detection and correction of deficiencies and thus the development of more robust and transparent machine learning methods. We detail the independent reproduction, replication, and reevaluation of initially published experiments with a method that we want to extend. For each step, we identify issues and draw lessons learned. We further discuss solutions that have proven effective in overcoming the encountered problems. This work can serve as a guide for further reproducibility studies and generally improve reproducibility in machine learning 7 [6.1.3].

Approximate weighted model counting for neural probabilistic reasoning

Neural probabilistic reasoning is a neuro-symbolic artificial intelligence method that has shown promising results, especially with systems like Scallop which have achieved state-of-the-art results for various tasks, such as visual question answering. During the probabilistic reasoning part, the computation of logical provenance formula’s prob- abilities is a major bottleneck. To address this problem, this work proposes a new approximation algorithm based on DPLL. Despite admitting an exponential complex- ity lower bound and being closely related to knowledge compilation methods commonly used, this algorithm performs better in practice. In addition, this algorithm makes it possible to avoid the complexity of the logical provenance computation phase, enabling new possibilities 9.

You can write want you want/need on this page by using HTML tags in the text editor or use the visual editor.

Research direction 1

…….

Research direction 2

……….

Research direction 3

……….

Presentation

Overall objectives

Objectives

Last activity report : 2024

Results

New results

Efficient Enumeration of Recursive Plans in Transformation-based Query Optimizers

Schema-Based Query Optimisation for Graph Databases

Efficient Iterative Programs with Distributed Data Collections

Reproduce, Replicate, Reevaluate. The Long but Safe Way to Extend Machine Learning Methods

Approximate weighted model counting for neural probabilistic reasoning

Research direction 1

Research direction 2

Research direction 3

Posts

Categories

Archives