### Overall objectives

The VeriDis project team includes members of the MOSEL group at LORIA, the computer science laboratory in Nancy, and members of the research group Automation of Logic at Max-Planck-Institut für Informatik in Saarbrücken. It is headed by Stephan Merz and Christoph Weidenbach. VeriDis was created in 2010 as a local research group of Inria Nancy – Grand Est and has been an Inria project team since July 2012.

The objectives of VeriDis are to contribute to advances in verification techniques, including automated and interactive theorem proving, and to make them available for the development and analysis of concurrent and distributed algorithms and systems, based on mathematically precise and practically applicable development methods. The techniques that we develop are intended to assist designers of algorithms and systems in carrying out formally proved developments, where proofs of relevant properties, as well as bugs, can be found with a high degree of automation.

Within this context, we work on techniques for automated theorem proving for expressive languages based on first-order logic, with support for theories (fragments of arithmetic, set theory etc.) that are relevant for specifying algorithms and systems. Ideally, systems and their properties would be specified in high-level, expressive languages, errors in specifications would be discovered automatically, and finally, full verification could also be performed completely automatically. Due to the fundamental undecidability of the problem, this cannot be achieved in general. Nevertheless, we have observed important advances in automated deduction in recent years, to which we have contributed. These advances suggest that a substantially higher degree of automation can be achieved over what is available in today’s tools supporting deductive verification. Our techniques are developed within SMT (satisfiability modulo theories) solving and superposition reasoning, the two main frameworks of contemporary automated reasoning that have complementary strengths and weaknesses, and we are interested in making them converge when appropriate. Techniques developed within the symbolic computation domain, such as algorithms for quantifier elimination for appropriate theories, are also relevant, and we are working on integrating them into our portfolio of techniques. In order to handle expressive input languages, we are working on techniques that encompass tractable fragments of higher-order logic, for example for specifying inductive or co-inductive data types, for automating proofs by induction, or for handling collections defined through a characteristic predicate.

Since full automatic verification remains elusive, another line of our research targets interactive proof platforms. We intend these platforms to benefit from our work on automated deduction by incorporating powerful automated backends and thus raise the degree of automation beyond what current proof assistants can offer. Since most conjectures stated by users are initially wrong (due to type errors, omitted hypotheses or overlooked border cases), it is also important that proof assistants be able to detect and explain such errors rather than letting users waste considerable time in futile proof attempts. Moreover, increased automation must not come at the expense of trustworthiness: skeptical proof assistants expect to be given an explanation of the proof found by the backend prover that they can certify.

Our methodological and foundational research is accompanied by the development of efficient software tools, several of which go beyond pure research prototypes: they have been used by others, have been integrated in proof platforms developed by other groups, and participate in international competitions. We also validate our work on proof techniques by applying them to the formal development of algorithms and systems. We mainly target high-level descriptions of concurrent and distributed algorithms and systems. This class of algorithms is by now ubiquitous, ranging from multi- and many-core algorithms to large networks and cloud computing, and their formal verification is notoriously difficult. Targeting high levels of abstraction allows the designs of such systems to be verified before an actual implementation has been developed, contributing to reducing the costs of formal verification. The potential of distributed systems for increased resilience to component failures makes them attractive in many contexts, but also makes formal verification even more important and challenging. Our work in this area aims at identifying classes of algorithms and systems for which we can provide guidelines and identify patterns of formal development that makes verification less an art and more an engineering discipline. We mainly target components of operating systems, distributed and cloud services, and networks of computers or mobile devices.

Beyond formal verification, we pursue applications of some of the symbolic techniques that we are developing in other domains. We have observed encouraging success in using techniques of symbolic computation for the qualitative analysis of biological and chemical networks described by systems of ordinary differential equations that were previously only accessible to large-scale simulation. Such networks include biological reaction networks as they occur with models for diseases such as diabetes or cancer. They furthermore include epidemic models such as variants and generalizations of SEIR models, which are typically used for Influenza A or Covid-19. This work is being pursued within a large-scale interdisciplinary collaboration. It aims for our work grounded in verification to have an impact on the sciences, beyond engineering, which will feed back into our core formal methods community.

### Last activity report : 2020

- 2020 : PDF – HTML
- 2019 : PDF – HTML
- 2018 : PDF – HTML
- 2017 : PDF – HTML
- 2016 : PDF – HTML
- 2015 : PDF – HTML
- 2014 : PDF – HTML
- 2013 : PDF – HTML
- 2012 : PDF – HTML

## Results

#### New results

#### Automated and Interactive Theorem Proving

#### Contributions to SMT Techniques

A satisfiability problem is often expressed in a combination of theories, and a natural approach consists in solving the problem by combining the satisfiability procedures available for the component theories. This is the purpose of the combination method introduced by Nelson and Oppen. However, in its initial presentation, the Nelson-Oppen combination method requires the theories to be signature-disjoint and stably infinite. The design of a generic combination method for non-disjoint unions of theories is difficult, but it is worth exploring simple non-disjoint combinations that appear frequently in verification. An example is the case of shared sets, where sets are represented by unary predicates. Another example is the case of bridging functions between data structures and a target theory (e.g., a fragment of arithmetic).

In 2015, we defined a sound and complete combination procedure à la Nelson-Oppen for the theory of absolutely free data structures (including lists and trees) connected to another theory via bridging functions . This combination procedure has also been refined for standard interpretations. The resulting theory has a nice politeness property, enabling combinations with arbitrary decidable theories of elements. We also investigated other theories amenable to similar combinations: this class includes the theory of equality, the theory of absolutely free data structures, and all the theories in between.

In 2018 and 2019, we have been improving the framework and unified both results. This was published in the Journal of Automated Reasoning in 2020 .

The above works pave the way for combinations involving the theory of algebraic datatypes as found in the SMT-LIB. Together with colleagues in Iowa and Stanford, this was published at IJCAR 2020 . This article received a best paper award.

SMT solvers generally rely on various instantiation techniques for handling quantifiers. We built a unifying framework encompassing quantified formulas with equality and uninterpreted functions, such that the major instantiation techniques in SMT solving can be cast in that framework. It is based on the problem of E-ground (dis)unification, a variation of the classic Rigid E-unification problem. We introduced a sound and complete calculus to solve this problem in practice: Congruence Closure with Free Variables (CCFV). Experimental evaluations of implementations of CCFV demonstrate notable improvements in the state-of-the-art solver CVC4 and make the solver veriT competitive with state-of-the-art solvers for several benchmark libraries, in particular those originating in verification problems.

In 2019 and 2020, we investigated machine learning techniques for predicting the usefulness of an instance in order to decrease the number of instances passed to the SMT solver. For this, we proposed a meaningful way to characterize the state of an SMT solver, to collect instantiation learning data, and to integrate a predictor in the core of a state-of-the-art SMT solver. This ultimately leads to more efficient SMT solving for quantified problems.

SMT solvers have throughout the years been able to cope with increasingly expressive formulas, from ground logics to full first-order logic (FOL). In contrast, the extension of SMT solvers to higher-order logic (HOL) was mostly unexplored. We proposed a pragmatic extension for SMT solvers to support HOL reasoning natively without compromising performance on FOL reasoning, thus leveraging the extensive research and implementation efforts dedicated to efficient SMT solving. We showed how to generalize data structures and the ground decision procedure to support partial applications and extensionality, as well as how to reconcile quantifier instantiation techniques with higher-order variables. We also discussed a separate approach for redesigning an SMT solver for higher-order logic from the ground up via new data structures and algorithms. We applied our pragmatic extension to the CVC4 SMT solver and discussed a redesign of the veriT SMT solver. Our evaluation showed that they are competitive with state-of-the-art HOL provers and often outperform the traditional encoding into FOL.

This result was published at CADE 2019 . Work in 2020 focused on extending the CCFV algorithm to higher-order logic : the first-order algorithm is not directly usable since it strongly relies on the fact that functions are fully applied, and no variable can appear in a function place. It is also necessary to find a radically different approach. Our approach is to work on an encoding of the CCFV higher-order problem into SAT.

We have previously developed a framework for processing formulas in automatic theorem provers, with generation of detailed proofs that can be checked by external tools, including skeptical proof assistants. The main components are a generic contextual recursion algorithm and an extensible set of inference rules. Clausification, skolemization, theory-specific simplifications, and expansion of `let’ expressions are instances of this framework. With suitable data structures, proof generation adds only a linear-time overhead, and proofs can be checked in linear time. We implemented the approach in the SMT solver veriT. This allowed us to dramatically simplify the code base while increasing the number of problems for which detailed proofs can be produced. In 2019, the format of proof output was further improved, while also improving the reconstruction procedure in the proof assistant Isabelle/HOL. This allowed the tactic using SMT with proofs to be regularly suggested by Sledgehammer as the fastest method to automatically solve proof goals. This was the subject of a workshop publication in 2019 . In 2020, we have made steady progress on this front, and thanks to this progress, the veriT solver has been integrated into Isabelle with full support of reconstruction for veriT proof. This lead to improvements in the Sledgehammer facility to automatically discharge Isabelle proofs.

We are part of a group developing a framework for formal refutational completeness proofs of abstract provers that implement automated reasoning calculi, such as CDCL (Conflict Driven Clause Learning), ordered resolution, or superposition.

For CDCL we have been able to derive the state-of-the-art algorithms from a simple, verified core, inheriting its properties via instantiation. Then we could further refine this setting up to the generation of executable code that performs surprisingly well compared to hand-coded CDCL implementations .

The framework relies on modular extensions of lifted redundancy criteria that underlie the deletion of subsumed formulas. In presentations of proof calculi, this aspect is usually only discussed informally. Our framework allows us to extend redundancy criteria so that they cover subsumption, and also to model entire prover architectures in such a way that the static refutational completeness of a calculus immediately implies the dynamic refutational completeness of a prover implementing the calculus, for instance within an Otter or DISCOUNT loop. Our framework is mechanized in Isabelle/HOL. This research was presented at IJCAR 2020 .

The Coq proof assistant is powerful enough to reproduce cyclic reasoning for first-order logic with inductive definitions (FOLCyclist proofs, including a proof for the 2-Hydra problem, and non-trivial cyclic SPIKE proofs of conjectures about conditional specifications.

Abduction is the process of explaining new observations using background knowledge. It is central to knowledge discovery and knowledge processing and has been intensely studied in various domains such as artificial intelligence, philosophy and logic.

Signature-based abduction aims at building hypotheses over a specified set of names, the signature, that explain an observation relative to some background knowledge. This type of abduction is useful for tasks such as diagnosis, where the vocabulary used for observed symptoms differs from the vocabulary expected to explain those symptoms. In the description logic literature, abduction has received little attention, despite being recognised as important for ontology repair, query update and matchmaking.

S. Tourret, together with P. Koopmann, W. Del-Pinto and R. Schmidt, presented the first complete method solving signature-based abduction for observations expressed in the expressive description logic

In joint work with P. Koopmann, we are currently investigating an alternative approach to abduction for description logics based on a translation to first-order logic and back. This work is motivated by the recent development of efficient tools for abductive reasoning in first-order logic.

This notion of semi-relevance is particularly interesting and can be detected using SOS-resolution derivations. We are currently working on a generalized proof of the SOS strategy for resolution to validate the theory behind the tests for semi-relevance in first-order logic.

Inductive Logic Programming (ILP) is a form of machine learning that induces hypotheses from examples and background knowledge. Many forms of ILP use second-order Horn clauses as templates, also denoted as meta-rules, to learn logic programs, and several of them rely on SLD resolution to produce new candidate solutions. Determining which meta-rules to use for a given learning task is a major open problem in ILP and most approaches use clauses provided by the designers of the systems without any theoretical justifications.

In joint work with A. Cropper we formalized the derivation reduction problem for SLD resolution, the undecidable problem of finding a finite subset of a set of clauses from which the whole set can be derived using SLD resolution. We studied the reducibility of various fragments of second-order Horn logic that are relevant in ILP and extended our results to standard resolution. We also conducted an empirical study of the effects of using reduced sets of such metarules on the overall learning accuracy and time, which shows a substantial improvement over the state of the art, in addition to the theoretical guarantees offered.

We reconsider the encoding of proof obligations that arise in proofs about TLA+ specifications in multi-sorted first-order logic. In his PhD thesis, Antoine Defourné studies correctness criteria for assigning types to TLA+ expressions, based on embeddings between models for different logics. The objective is to delineate what type assignments are sound when translating from the untyped TLA+ language into the multi-sorted logics underlying typical automated reasoning engines.

He also implemented a new back-end reasoner based on Zipperposition for handling proof obligations that involve features of higher-order logic such as function or predicate variables that may appear in TLA+ proof sequents. The design of this back-end was presented at JFLA 2020 , and a working prototype at the TLA+ Community Meeting .

During his undergraduate internship, Raphaël Le Bihan revisited the coalescing technique used in TLAPS for separating first-order and modal reasoning.

#### Contributions to Formal Methods of System Design

Refinement of a specification expressed at a high level of abstraction by a lower-level specification is a fundamental concept in formal system development. A key problem in proving refinement is to demonstrate that suitable values of internal variables of the high-level specification can be assigned to every possible execution of the low-level specification. The standard technique for doing so is to exhibit a refinement mapping where values for these variables are computed for each state, but it is also well known that this technique is not complete. In joint work with Leslie Lamport (Microsoft Research), we revisit the classic paper that introduced constructions for auxiliary variables in order to strengthen the refinement mapping technique. In particular, we introduce simpler rules for defining prophecy variables and demonstrate how they can be used for proving the correctness of an algorithm implementing a linearizable object. We also show that our constructions of auxiliary variables yield a complete proof method. An article based on this work has been submitted for publication to a journal.

PlusCal is a language for describing algorithms. It has the look and feel of pseudo-code, but also has a formal semantics through a translation to TLA+ specifications. During her master internship, Heba Al Kayed extended the PlusCal language and translator so that it is more suitable for modeling distributed algorithms. As a first extension, parallel processes may have several code blocks that represent threads communicating through local variables. Second, communication channels are introduced as first-class entities, together with send, multicast, and receive operations. This work was presented at the TLA+ Community Meeting .

When interactive systems allow users to interact with critical systems, they are qualified as Critical Interactive Systems. Their design requires the support of different activities and tasks to achieve user goals. Examples of such systems are cockpits, nuclear plant control panels, medical devices, etc. Such critical systems are very difficult to model due to the complexity of the offered interaction capabilities. In joint work with Ismaël Mendil, Neeraj Kumar Singh, Yamine Aït-Ameur, and Philippe Palanque (IRIT Toulouse), we present a formal framework, F3FLUID (Formal Framework For FLUID), for designing safety-critical interactive systems. It relies on FLUID as the core modelling language. FLUID enables modelling and using interactive systems domain concepts and supports an incremental design of such systems. Formal verification, validation and animation of the designed models are supported through different transformations of FLUID models into target formal verification techniques: Event-B for formal verification, ProB model checker for animation and Interactive Cooperative Objects for user validation. The Event-B models are generated from FLUID while ICO and ProB models are produced from Event-B. We exemplify the real-life case study TCAS (Traffic alert and Collision Avoidance System) to demonstrate our framework.

In joint work with Markus Kroetzsch and Christof Fetzer (Technical University of Dresden), we have introduced a logical fragment called SUPERLOG (Supervisor Logic) that is meant to provide a basis for formalizing abstract control algorithms found in ECUs (Electronical Control Unit), plus fully automated verification, plus execution . Technically, the language is an extension of the first-order Bernays-Schoenfinkel fragment with arithmetic constraints. It extends the well known SMT fragment by universally quantified variables. We have developed a sound and complete calculus for the SUPERLOG language . The calculus supports non-exhaustive propagation and can therefore be a role model for other calculi where exhaustive propagation cannot be afforded . Based on the decidability results obtained by Marco Voigt , we are working on fully automatic verification approaches for fragments of the SUPERLOG language. One line of research is to “hammer” verification conditions in SUPERLOG fragments to DATALOG for efficient solving. The other is to use abstractions guiding the search of the calculus, related to our abstraction refinement approach .

We completed in 2020 our work on developing a prototype tool for performing statistical model checking within the SimGrid framework. The goal was to give users the opportunity, in one single framework, to take advantage of both verification and simulation possibilities. To do so, we added to SimGrid the possibility to use stochastic profiles, introducing probabilities in the model of the network. The prototype tool can be interfaced with the SimGrid simulator to perform statistical model checking on the actual programs simulated using the SimGrid framework. The prototype was evaluated on examples such as the Bit Torrent protocol in which we added a probabilistic model of node failures. This work resulted in a publication at the SIMULTECH conference .

Hybrid systems are characterized by the interaction of continuous dynamics and discrete control. As hybrid systems become ubiquitous and more and more complex, analysis and synthesis techniques for designing safe hybrid systems are in high demand. This is however challenging due to the nature of hybrid systems and their designs, and the question of how to formulate and reason about their safety problems. Previous work has demonstrated how to extend the discrete modeling language Event-B with support for continuous operators and how to integrate traditional refinement in hybrid system design. In the same spirit, we extend previous work by proposing a strategy that can coherently refine an abstract hybrid system design with safety constraints down to the concrete one with implementable discrete control that can behave safely. Our proposal is validated on the design of a smart heating system.

Our work in toricity of steady state ideals of biomodels from 2019 has been accepted for journal publication and will appear during 2021. The approach there was to automatically recognize relevant geometric structure of steady state varieties in QF_NRA is a possible alternative, which has not been studied systematically. This year we managed to treat also the case of complex numbers on a purely logical basis , . Based on arguments from algebraic model theory, this also gives insights into the interdependencies of the occurrences of relevant geometric structures over the complex numbers versus the real numbers.

Geometric toricity of a variety resembles the algebraic concept of binomiality of the corresponding polynomial ideal. Generalizing that well-studied binomiality concept of chemical reaction networks, in , unconditional binomiality has been defined, its properties have been investigated and a linear algebra approach has been given for testing unconditional binomiality in the case of reversible reactions. A graph theoretical version of the linear algebra approach has been presented in .

In

we address, on the one hand, the simpler question whether or not there is are unique steady states, without going into details on the exact geometry. On the other hand, we do so in dependence on

parametric

reaction rates, so that the results are necessary and sufficient formal logical conditions in the Tarski Algebra. Again, the underlying methods are of logical nature, mostly real elimination methods like virtual substitution, cylindrical algebraic decomposition, and real triangular sets.

Our interdisciplinary work

in computer science, mathematics, and systems biology is concerned with the reduction of a system of ordinary differential equations (in time) into several simpler subsystems, each corresponding to a certain orders of magnitude of velocities, also called time scales, of the corresponding differential variables. To our knowledge this is the first mathematically rigorous approach for reaction networks that allows for multiple time scales. Previous work either did not give any formal guarantees on the obtained results, or was limited to only two different time scales. The computation is based on massive SMT solving over various theories, including

QF_LRA

for tropicalizations,

QF_NRA

for testing Hurwitz conditions on eigenvalues, and

QF_LIA

for finding sufficient differentiability conditions for hyperbolic attractivity of critical manifolds. Gröbner reduction techniques are used for final algebraic simplification.

As an example consider a model related to the transmission dynamics of subtype H5N6 of the avian Influenza A virus in the Philippines in August 2017 . That model is identified as BIOMD0000000716 in the BioModels database, a repository of mathematical models of biological processes . The model specifies four species: S_b (susceptible bird), I_b (infected bird), S_h (susceptible human), and I_h (infected human), the concentrations of which over time we denote by differential variables

Our approach reduces this to three systems

Notice the explicit constant factors on the right hand sides of the differential equations. We see that the system

This multiple time scale reduction of the bird flu model emphasizes a cascade of successive relaxations of different model variables. First, the population of susceptible birds relaxes, meaning that these variables reach quasi-steady state values. This relaxation is illustrated in Fig. (b). Then the population of infected birds relaxes as shown in Fig. (c). Finally, the populations of susceptible and infected humans relax to a stable steady state as shown in Fig. (d), following a reduced dynamics described by

Implicit differential equations, i.e. equations which are not solved for a derivative of highest order, appear in many applications. In particular, the so-called differential algebraic equations may be considered as a special case of implicit equations. Compared with equations in solved form, implicit equations are more complicated to analyze and show a much wider range of phenomena. Already basic questions about the existence and uniqueness of solutions of an initial value problem become much more involved. One reason is the possible appearance of singularities.

A key novelty of our approach is to consider the decisive linear system determining the Vessiot spaces first, independently of the given differential system. This allows us to make maximal use of the linearity and to apply a wide range of heuristic optimizations. Compared with the more comprehensive approach of , this also leads to an increased flexibility and we believe that the new approach will be in general more efficient in the sense that fewer cases will be returned.