ENS, S16

**Capturing Homomorphism-Closed Decidable Queries with Existential Rules**

Existential rules are a well studied ontology-mediated query language for which the chase represents a generic computational approach for query answering. It is straightforward that existential rule queries exhibiting chase termination are decidable and can only recognize properties that are preserved under homomorphisms. In this paper, we show the converse: every decidable query that is closed under homomorphism can be expressed by an existential rule set for which the standard chase universally terminates. Membership in this fragment is not decidable, but we show via a diagonalisation argument that this is unavoidable.

Online seminar: https://bbb.di.ens.fr/b/cam-pz6-kdj

**Traversal algorithms and heuristics for reasoning over logic based knowledge graphs (Davide Benedetto)**

Knowledge Graphs (KGs) provide a concise and intuitive abstraction for a variety of domains where edges capture the (potentially recursive) relationships between the entities. This is leading to the rise of systems and tools able to facilitate graph data modeling, processing, and analysis, with prominent AI companies developing core systems based on the property graph model.

In this context, Datalog-based languages are being re-discovered to be ductile to accomplish reasoning tasks over complex property graphs as they provide the essential elements to enable graph navigational operations.

The semantics of a Datalog program is usually specified in an operational way via the chase procedure. It entails multiple non-deterministic choices such as the rule application order and the fact binding order when multiple unification is possible. In state-of-the-art reasoners, chase-based procedures are not directly adopted, but encoded in the form of engineered variations of the volcano iterator model and so essentially within a pipe-and-filters architecture, where nodes (filters) are relational algebra operators and edges (pipes) are dependency connections between the rules. Such (potentially cyclic) structures, known as access plans, need to be translated into reasoning plans, where abstract relational algebra operators are transformed into specific project, select and join implementations: many implementations of each operator exist and it is up to the optimizer to choose the best one in terms of execution cost.

Here I focus on cases where the Datalog reasoning process involves a graph traversal task and I investigate the connection between reasoning plans and graph traversal strategies. Then I move from the observation that the nondeterministic choices posed by the chase can be leveraged to control graph traversals —allowing to alternate breadth-first and depth-first strategies— and study the link of such choices with the reasoning plans. I will conclude that in plans, specific join implementations and rule prioritization policies reflect the nondeterministic choices and exploit them to guide graph traversals in modern reasoners. Specifically, I implemented the results in the Vadalog System, a state-of-the-art knowledge graph management systems and conduct experimental evaluation.

**Evolution of AI (as of 2021) (Shrey Mishra)**

AI itself is very vast and not be just Machine Learning(although majority of learning is) there are other aspects to it when building a custom solution and those solutions generally tend to revolve around other subtopics such as Discrete optimisation, meta heuristics, Deep learning and NLP.

With a lot of currently available tools and frameworks it can be a bit harder to keep track of the right set of subtasks/tools depending upon the problem statement. The talk will be divided into 3 parts:

1. Introduction to various problem statements where different subforms of AI can be used ? (Examples of topics/tools and how to get started ?)

2. What are recent developments in AI ? The AI timeline of Important papers and tools and how to use them ? (Examples of important papers in the AI community)

3. How did I implement some of these on various different tasks (My masters and previous background)?

I will try to include some code snippets on some of the topics that I have covered so far when trying to solve several real world problems revolving around sentiment analysis, image detection, optimisation, deploying machine learning models etc.

Online seminar: https://bbb.di.ens.fr/b/cam-pz6-kdj

**Grammars for Document Spanners (Liat Peterfreund)
**

We propose a new grammar-based language for defining information-extractors from documents (text) that is built upon the well-studied framework of Document Spanners for extracting structured data from text. While previously studied formalisms for document spanners are mainly based on regular expressions, we use an extension of context-free grammars, called extraction grammars, to define the new class of context-free spanners. Extraction grammars are simply context-free grammars extended with variables that capture interval positions of the document, namely spans. While regular expressions are efficient for tokenizing and tagging, context-free grammars are also efficient for capturing structural properties. Indeed, we show that context-free spanners are strictly more expressive than their regular counterparts. We reason about the expressive power of our new class and present a pushdown-automata model that captures it. We show that extraction grammars can be evaluated with polynomial data complexity. Nevertheless, as the degree of the polynomial depends on the query, we present an enumeration algorithm for (a subset of) extraction grammars that, after quintic preprocessing, outputs the results sequentially, without repetitions, with a constant delay between every two consecutive ones.

**Provenance-Based Algorithms for Rich Queries over Graph Databases (Yann Ramusat)**

In this paper, we investigate the efficient computation of the provenance of rich queries over graph databases. We show that semiring-based provenance annotations enrich the expressiveness of routing queries over graphs. Several algorithms have previously been proposed for provenance computation over graphs, each yielding a trade-off between time complexity and generality. Here, we address the limitations of these algorithms and propose a new one, partially bridging a complexity and expressiveness gap and adding to the algorithmic toolkit for solving this problem. Importantly, we provide a comprehensive taxonomy of semirings and corresponding algorithms, establishing which practical approaches are needed in different cases. We implement and comprehensively evaluate several practical applications of the problem (e.g., shortest distances, top-k shortest distances, Boolean or integer path features), each corresponding to a specific semiring and algorithm, that depends on the properties of the semiring. On several real-world and synthetic graph datasets, we show that the algorithms we propose exhibit large practical benefits for processing rich graph queries.

]]>18 December 2020, 10:30-11:30

Online seminar: https://bbb.di.ens.fr/b/cam-pz6-kdj

**Preferred repairs over inconsistent knowledge bases and connections to argumentation**

A fundamental notion when reasoning over inconsistent knowledge bases is that of a repair, defined as a maximal subset of the data that is consistent w.r.t. the expressed knowledge. Various forms of preferred repairs have been introduced for identifying the most relevant repairs based upon information about the reliability of different facts. In this talk, we will introduce the different kinds of preferred repairs that have been proposed in the database and knowledge representation literatures and review what is known about the complexity of reasoning with such repairs. We will also present some interesting connections with the area of abstract argumentation, which inspired a new notion of preferred repair with desirable computational properties.

This talk is primarily based upon a KR 2020 paper co-authored with Camille Bourgaux.

Bio:

Meghyn Bienvenu is a CNRS researcher and member of the LaBRI laboratory at the University of Bordeaux.

Her research interests span a range of topics in knowledge representation and reasoning and database theory, with a main focus on description logic ontologies and their use in querying data. She currently leads an ANR AI Chair on the topic of intelligent handling of imperfect data. Bienvenu is an associate editor of ACM Transactions on Computational Logic and will serve as PC co-chair for KR 2021, the leading conference on knowledge representation and reasoning. Her research has been recognized by an invited Early Career Spotlight talk at IJCAI’16, the world’s premier AI conference, and the 2016 CNRS Bronze Medal in the area of computer science.

21 February 2020, 10:00-11:00

ENS, S16

**Provenance Analysis for First-order Model Checking**

Is a given finite structure a model of a given first-order sentence? The provenance analysis of this question determines how its answer depends on the atomic facts that determine the structure. Provenance questions like this one have emerged in databases, scientific workflows, networks, formal verification, and other areas. In joint work with Erich Grädel (RWTH Aachen University) we extend the semiring provenance framework, developed in databases, to the first-order model checking problem. This provides a non-standard semantics for first-order logic that refines logical truth to values in commutative semirings: a semiring of provenance polynomials, the Viterbi semiring of confidence scores, access control semirings, etc. The semantics can be used to synthesize models based on criteria like maximum confidence or public access. Our uniform treatment of logical negation can be used to explain missing answers for queries, and failures of integrity constraints, as well to compute corresponding repairs that fix these issues. The work on repairs is also joint with Abdu Alawini, Jane Xu, and Waley Zhang (Penn).

]]>28 February 2020, 10:30-11:30

ENS, S16

**The Complexity of Answering Unions of Conjunctive Queries**

We discuss the complexity of enumerating (listing) the answers to a query over a relational database. In particular, we consider three variants: arbitrary order, uniformly random order, and random access. We focus on the class of join queries: Conjunctive Queries (CQs) and Unions of Conjunctive Queries (UCQs), and on the ability to list the answers with linear preprocessing and logarithmic time per answer. A known dichotomy classifies CQs into those that admit such enumeration and those that do not. I will talk about my research towards extending this dichotomy to UCQs. This generalization turns out to be quite challenging. For example, a union of tractable CQs may be intractable w.r.t. random access; on the other hand, a union of intractable CQs may be tractable w.r.t. enumeration.

Bio:

Nofar Carmeli is a Ph.D. student in the Data and Knowledge group at Technion, Israel Institute of Technology, advised by Prof. Benny Kimelfeld. Her research focuses on query optimization with guarantees using enumeration techniques. Nofar completed her BSc in 2015 in the Lapidim excellence program of the Computer Science Department of Technion.

Associate Professor | School of Computing Science

Simon Fraser University 8888 University Dr., Burnaby, B.C. V5A 1S6

13 February 2020, 10:30-11:30

ENS, S16

**Logic of Information Flows: Expressing Reachability and Cardinality Properties**

A challenge in descriptive complexity is to identify logics with low complexity that simultaneously express fundamental reachability and counting properties on unordered structures. We define a family of logics that allow fine control of the complexity of order-independent computations. The approach is based on adding the ability to reason about information propagations in first-order logic with fixed points, FO(FP). Two parameters are used to control expressiveness and complexity: the set of logical connectives and the expressiveness of atomic units. We restrict both and obtain a modal temporal (Dynamic) logic over monadic unions of conjunctive queries. A crucial component is a dynamic version of Hilbert’s Epsilon operator. We identify a fragment with polytime data complexity and show that it can express both reachability and counting properties on unordered structures. Finally, we formalize Epsilon-invariance property for this logic and conjecture its decidability.

]]>7 February 2020, 10:30-11:30

ENS, S16

**An Experimental Study of the Treewidth of Real-World Data**

Treewidth is a parameter that measures how tree-like a data instance is, and whether it can reasonably be decomposed into a data structure resembling a tree.

Many computation tasks are known to be tractable on data having small treewidth, but computing the treewidth of a given instance is intractable. This talk presents the first large-scale experimental study of treewidth and tree decompositions of real-world data, with a focus on graph data. We aim to find out which data, if any, can benefit of the wealth of algorithms for data of small treewidth. For each dataset, we obtain upper and lower bound estimations of their treewidth, and study the properties of their tree decompositions. We show in particular that, even when treewidth is high, using partial tree decompositions can result in data structures that can assist algorithms.

29 November 2019, 10:30-11:30

ENS, S16

**A Brief History of Knowledge Graph’s Main Ideas**

Knowledge Graphs can be considered to be fulfilling an early vision in Computer Science of creating intelligent systems that integrate knowledge and data at large scale. The term “Knowledge Graph” has rapidly gained popularity in academia and industry since Google popularized it in 2012. It is paramount to note that, regardless of the discussions on, and definitions of the “Knowledge Graph” term, it stems from scientific advancements in diverse research areas such as Semantic Web, Databases, Knowledge Representation and Reasoning, NLP, Machine Learning, among others.

The integration of ideas and techniques from such disparate disciplines give the richness to the notion of Knowledge Graph, but at the same time presents a challenge to practitioners and researchers to know how current advances develop from, and are rooted in, early techniques.

In this talk, Juan will provide a historical context on the roots of Knowledge Graphs grounded in the advancements of the computer science disciplines of Knowledge, Data and the combination thereof, starting from the 1950s.

For more details, please read the paper: http://knowledgegraph.today/paper.html

Bio: Juan F. Sequeda is the Principal Scientist at data.world. He joined through the acquisition of Capsenta, a company he founded as a spin-off from his research. He holds a PhD in Computer Science from The University of Texas at Austin.

Juan is the recipient of the NSF Graduate Research Fellowship, received 2nd Place in the 2013 Semantic Web Challenge for his work on ConstituteProject.org, Best Student Research Paper at the 2014 International Semantic Web Conference, the 2015 Best Transfer and Innovation Project awarded by the Institute for Applied Informatics and nominated for best papers multiple times. Juan is on the Editorial Board of the Journal of Web Semantics, member of multiple program committees (ISWC, ESWC, WWW, AAAI, IJCAI). He was the General Chair of AMW2018, PC chair of ISWC 2017 In-Use track, co-creator of COLD workshop (7 years co-located at ISWC). He has served as a bridge between academia and industry as the current chair of the Property Graph Schema Working Group, member of the Graph Query Languages task force of the Linked Data Benchmark Council (LDBC) and past invited expert member and standards editor at the World Wide Web Consortium (W3C).

Wearing his scientific hat, Juan’s goal is to reliably create knowledge from inscrutable data. His research interests are on the intersection of Logic and Data for (ontology-based) data integration and semantic/graph data management, and what now is called Knowledge Graphs.

Wearing his business hat, Juan is a product manager, does business development and strategy, technical sales and works with customers to understand their problems to translated back to R&D.

]]>ENS, S16

**Order-invariant first-order logic over hollow trees. **

Order-invariant first-order logic is the extension of first-order logic in which the usage of an additional ordering relation on the structure’s universe is allowed, provided that the evaluation of sentences is independent of the choice of a particular order. We show that the expressive power of order-invariant first-order logic collapses to first-order logic over hollow trees. A hollow tree is an unranked ordered tree where every non leaf node has at most four adjacent nodes: two siblings (left and right) and its first and last children. In particular there is no predicate for the linear order among siblings nor for the descendant relation. Moreover only the first and last nodes of a siblinghood are linked to their parent node, and the parent-child relation cannot be completely reconstructed in first-order.