Projects

Table of contents

H2020 Twinning Project OLISSIPO

Coordinator: Susana Vinga, INESC-ID, Instituto Superior Técnico, Lisbon, Portugal.
Coordinator in ERABLE Team at Inria: Marie-France Sagot.
Duration: 2021-2024

Brief description
OLISSIPO is a H2020 Twinning project coordinated by INESC-ID. The project focuses on Computational Biology, a strongly interdisciplinary area that combines Computer Science, Algorithms, Mathematics, Probability and Statistics, Machine Learning, Molecular Biology and Medicine. The project consortium is composed of INESC-ID (Coordinator), the National Institute for Research in Digital Science and Technology (Inria) through the Erable team, the Swiss Federal Institute of Technology (ETH Zurich) and the European Molecular Biology Laboratory (EMBL). The main goal of the project is to intensify, increase and consolidate the research in Computational Biology carried out at INESC-ID in partnership with the European partner institutions.

ITMO Cancer Mathematics and Computer Science Program project MITOTIC

Coordinator: Coordinator: Sabine Peres.
Other participants: Renaud Dentin, Institut Cochin, Inserm; Vincent Fromion, MaIAGE Team, INRAe.
Full title: Ressources Balances Analyses pour découvrir la vulnérabilité métabolique dans le cancer et identifier de nouvelles thérapies
Duration: 2021-2024

ANR JCJC project PIECES

Coordinator: Laurent Jacob.
Duration: 2021-2024

Brief description
Genetic variation can have causal effects on a variety of phenotypes ranging from human health risks to bacterial drug resistance and crop yield. Unraveling the relationship between genotypes and phenotypes is therefore crucial for both basic and applied science. Genomes have historically been treated as small variations around a reference sequence in computational biology and statistics. Genome Wide Association Studies (GWAS) for example typically start by aligning the genomes of all samples in a panel against a reference genome. Each sample is then represented by its set of point mutations, and typical methods test the statistical association between the presence of a mutation and a phenotype of interest. In many important cases however, alignments are not appropriate. Microbes for examples sometimes have entire genes which are not present in all individuals. Most alignment-free representations rely on the exact presence of sub-sequences in the genomes. However, genomic variants are often better described in terms of sequence motifs, indicating frequencies of each letter at each position. The recently introduced CKN-seq method implicitly defines infinite sets of genomic features akin to sequence motifs, and selects the ones that are most relevant for a learning task. The PIECES project will extend CKN-seq and exploit its ability to represent unaligned sequences through three tasks: (1) GWAS on infinite sets of patterns, (2) Interpretable exploratory analysis of sequences, (3) Learning on populations of sequences.

H2020 ITN Project ALPACA

Coordinator: Alexander Schönhuth, University of Bielefeld, Germany.
Duration: 2021-2024

Brief description
ALPACA (“Algorithms for PAngenome Computational Analysis”) is an EU Marie Skłodowska-Curie Innovative Training Networks (ITN) consortium grant that will train 14 PhD researchers, including one at CWI who will be co-supervised by Solon Pissis and Leen Stougie, and one at the University of Pisa supervised by Nadia Pisanti.
Genomes are strings over the letters A, C, G, T, which represent nucleotides, the building blocks of DNA. In view of ultra-large amounts of genome sequence data emerging from ever more and technologically rapidly advancing genome sequencing devices—in the meantime, amounts of sequencing data accrued are reaching into the exabyte scale—the driving, urgent question is: how can we arrange and analyse these data masses in a formally rigorous, computationally efficient and biomedically rewarding manner? Graph based data structures have been pointed out to have disruptive benefits over traditional sequence-based structures when representing pan-genomes, sufficiently large, evolutionarily coherent collections of genomes. This project will put this shift of paradigms—from sequence to graph based representations of genomes—into full effect.
The project consortium consists of 23 academic and industrial partners, including CWI, the University of Bielefeld (Germany), the CNRS (France), the INRIA (France), the University of Pisa (Italy), the University of Milan-Bicocca (Italy), the Heinrich Heine University in Düsseldorf (Germany), the European Molecular Biology Laboratory (EMBl-EBI), the Comenius University in Bratislava (Slovakia), the University of Helsinki (Finland), the Pasteur Institute (France), and the University of Cambridge (UK).

NWO Grant on Optimisation for and with Machine Learning

Coordinator: D. den Hertog, Tilburg University
Participant in ERABLE Team at CWI: Leen Stougie.
Duration: 2020-2025

Brief description
Machine learning is often in the news because of remarkable applications such as image recognition and self-driving cars. When constructing machine learning models, such as deep learning and random forests, mathematical optimisation plays an important role. In the first project part we want to better understand the performance of existing optimisation techniques for machine learning and also develop faster and better optimisation techniques. In the second part we use machine learning techniques to solve optimisation problems faster and more accurately. The new techniques are applied to classification problems for medical treatments, finding genetic relationships, food distribution chains for the World Food Programme, and self-driving cars.
This programme is coordinated by Tilburg University, with participants at TUD, Tilburg University and CWI. In close collaboration with TUD, Leen Stougie from CWI and Erable together with Leo J.J. van Iersel from TUD will work on improving models and optimisation methods for life sciences inspired problems, such as classification of virulent yeast strains and problems in phylogeny. At CWI also, Nikhil Bansal and Monique Laurent from the Networks and Optimization group at CWI will study combinatorial and polynomial optimisation based methods for machine learning, develop theoretical analysis for heuristics used in machine learning, and use machine learning to design optimization algorithms that exploit structure in data. More information about this grant may be found here.

Inria Associated Team CAPOEIRA

Coordinators: Marie-France Sagot (Erable); André Fujita (Universidade de São Paulo (USP), São Paulo, Brazil).
Duration: 2020-2024

Brief description
The project covers theoretical computer science (essentially graph theory), mathematics (combinatorics, statistics, and probability), and the development of algorithms to address various biological questions, in particular, the intra and cross-species interactions, which have implications in all aspects of life sciences, including health, ecology, and environment.
Two main general topics will be addressed, namely evolution/co-evolution, and biological network (graph/hypergraph) analysis and comparison. The first topic concerns better understanding and characterising the moment of speciation leading to new species on one hand, and on the other, how one set of species may influence the evolution of another. The second topic concerns metabolism on one hand, and (post-)transcriptional regulation on the other, with the post-transcriptional level involving also inference “from scratch” of the main actors, namely the non-coding RNAs and their targets, and the regulatory network they form. In the first two cases (of metabolism and transcriptional regulation), we will assume that the networks are already inferred albeit with possibly numerous missing and incorrect data. Finally, in the case of regulation, we will also consider the problem of inferring variants, notably related to alternative splicing, from a set of RNA-seq data using a de Bruijn graph approach. Overseeing these two main topics are the issues of knowledge representation and model revision that will also be addressed. These are crucial in the life sciences, and notably in the context of post-transcriptional regulation by non-coding RNAs, for which the different actors, features, and overall mechanisms are constantly being questioned and revised.

Capes-Cofecub Project AHIMSA

Coordinators: Coordinators: Marie-France Sagot (Erable); Andréa Ávila (Instituto de Biologia Molecular do Paraná – Fiocruz-PR, Curitiba, Paraná, Brazil).
Duration: 2020-2024

Brief description
One of the objectives of this project, the one with highest risk of not succeeding, is to explore Ahimsa-like approaches to sickness and health. However, our main objective is to first understand how these organisms respond to drug treatments or reshape the host cells after infection. We further aim to focus on a community-vision approach to living organisms which will try to gather information from multiple partners of the biological systems we are studying. This presents further risks that are both methodological and experimental. Modelling communities up to the molecular level is indeed hard because of a lack of enough or of adequate data. Modelling and then experimentally manipulating such communities is tricky also because of the complexity of having to handle many different processes taking place at very different levels and time scales.

Older projects

Information on older projects of the team may be found here. Notice that this page remains uncomplete.

Projects

H2020 Twinning Project OLISSIPO

ITMO Cancer Mathematics and Computer Science Program project MITOTIC

ANR JCJC project PIECES

H2020 ITN Project ALPACA

NWO Grant on Optimisation for and with Machine Learning

Inria Associated Team CAPOEIRA

Capes-Cofecub Project AHIMSA

Older projects

Meta