NWO Grant on Optimisation for and with Machine Learning
Coordinator: Coordinator: D. den Hertog, Tilburg University; Participant in ERABLE Team at CWI: Leen Stougie.
Machine learning is often in the news because of remarkable applications such as image recognition and self-driving cars. When constructing machine learning models, such as deep learning and random forests, mathematical optimization plays an important role. In the first project part we want to better understand the performance of existing optimization techniques for machine learning and also develop faster and better optimization techniques. In the second part we use machine learning techniques to solve optimization problems faster and more accurately. The new techniques are applied to classification problems for medical treatments, finding genetic relationships, food distribution chains for the World Food Programme, and self-driving cars.
This programme is coordinated by Tilburg University, with participants at TUD, Tilburg University and CWI. In close collaboration with TUD, Leen Stougie from CWI and Erable together with Leo J.J. van Iersel from TUD will work on improving models and optimisation methods for life sciences inspired problems, such as classification of virulent yeast strains and problems in phylogeny. At CWI also, Nikhil Bansal and Monique Laurent from the Networks and Optimization group at CWI will study combinatorial and polynomial optimisation based methods for machine learning, develop theoretical analysis for heuristics used in machine learning, and use machine learning to design optimization algorithms that exploit structure in data. More information about this grant may be found here.
Inria Associated Team CAPOEIRA
Coordinators: Coordinators: Marie-France Sagot (Erable); André Fujita (Universidade de São Paulo (USP), São Paulo, Brazil).
The project covers theoretical computer science (essentially graph theory), mathematics (combinatorics, statistics, and probability), and the development of algorithms to address various biological questions, in particular, the intra and cross-species interactions, which have implications in all aspects of life sciences, including health, ecology, and environment.
Two main general topics will be addressed, namely evolution/co-evolution, and biological network (graph/hypergraph) analysis and comparison. The first topic concerns better understanding and characterising the moment of speciation leading to new species on one hand, and on the other, how one set of species may influence the evolution of another. The second topic concerns metabolism on one hand, and (post-)transcriptional regulation on the other, with the post-transcriptional level involving also inference “from scratch” of the main actors, namely the non-coding RNAs and their targets, and the regulatory network they form. In the first two cases (of metabolism and transcriptional regulation), we will assume that the networks are already inferred albeit with possibly numerous missing and incorrect data. Finally, in the case of regulation, we will also consider the problem of inferring variants, notably related to alternative splicing, from a set of RNA-seq data using a de Bruijn graph approach. Overseeing these two main topics are the issues of knowledge representation and model revision that will also be addressed. These are crucial in the life sciences, and notably in the context of post-transcriptional regulation by non-coding RNAs, for which the different actors, features, and overall mechanisms are constantly being questioned and revised.
Capes-Cofecub Project AHIMSA
Coordinators: Coordinators: Marie-France Sagot (Erable); Andréa Ávila (Instituto de Biologia Molecular do Paraná – Fiocruz-PR, Curitiba, Paraná, Brazil).
One of the objectives of this project, the one with highest risk of not succeeding, is to explore Ahimsa-like approaches to sickness and health. However, our main objective is to first understand how these organisms respond to drug treatments or reshape the host cells after infection. We further aim to focus on a community-vision approach to living organisms which will try to gather information from multiple partners of the biological systems we are studying. This presents further risks that are both methodological and experimental. Modelling communities up to the molecular level is indeed hard because of a lack of enough or of adequate data. Modelling and then experimentally manipulating such communities is tricky also because of the complexity of having to handle many different processes taking place at very different levels and time scales.
ANR project U4ATAC-BRAIN U4ATAC-Brain
Coordinators: Coordinators: (General) Patrick Edery & Sylvie Mazoyer (CRNL Lyon, INSERM-CNRS-UCBL), (in Lyon) Vincent Lacroix BAOBAB-ERABLE Teams LBBE-UCBL-INRIA, (in Montpellier) Remi Bordonné IGMM CNRS, (in Paris) Anne-Louise Leutenegger U946 INSERM.
The aim of this proposal is to unravel the role played by minor splicing in embryonic development, with a focus on brain development. We will achieve this task through the study of rare malformation syndromes with abnormal brain development caused by a defect of U4atac snRNA, a component of the minor spliceosome. We will take advantage of our collective first-rate expertises on human syndromes, genomics, transcriptomics, bioinformatics, biochemistry and zebrafish model to perform a multidisciplinary research study aiming at understanding the physiopathology of brain malformations caused by RNU4ATAC-associated minor splicing defects. The present project will also help to improve the genetic counselling of the families concerned by a RNU4ATAC mutation or related disorders.
Inria Associated Team COMPASSO
Coordinators: Marie-France Sagot (Erable); Susana Vinga (Instituto Superior Técnico (IST), Lisbon, Portugal).
Microbial communities are ubiquitous in nature and have major impact on every aspect of life in our planet. In spite of its importance, little is known about the principles that determine the functioning, robustness, evolution and control of such communities. The two teams that are partners of this project have some history of collaborating together. So far however, their main direct scientific concerns have been distinct in terms of final goals, while the areas of expertise are concentrated on computer science but with also some distinct characteristics. The main aim of this project is to work together towards establishing a strong link between the different application goals each team has had so far. This is an ambitious project, that will highly depend on further blending together the different expertises that each team has.
The French team has since some ten years now been highly interested in modelling and exploring species interactions. Such interactions indeed appear crucial to understand some if not all of the most fundamental evolutionary and functional questions related to living organisms. They remain however very little explored by computational biologists.
The Portuguese team on the other hand has been involved since a few years in a number of projects related mainly to cancer and rare diseases. The objective has been to develop the statistical and machine learning algorithms that would allow, using multi-omic data, to help propose personalised treatments to these diseases.
The ultimate aim of this project is to start building links between these two aspects, of species interactions and cancer/rare diseases, or more precisely, between infectious diseases and non infectious ones, whether they involve human or animals more in general. The main general questions that will be addressed are the following: (i) Are species interactions really a crucial factor on the development of at least some non infectious diseases as is suspected? (ii) If yes, could this disease be treated in a “non-aggressive” way by exploiting such species interactions? These are highly ambitious questions that will in the first three years be tackled through two angles. One concerns modelling and understanding the system biology of communities, and the second modelling and understanding the co-evolutionary aspects present in such communities. first will in fact cover both synthetic communities and natural ones.
ANR Technology SPOCK
Coordinator: Lilia (Brinza) Boucinha, MaatPharma; PhD supervisors from academia: Marie-France Sagot (Erable); Susana Vinga (Instituto Superior Técnico (IST), Lisbon, Portugal).
The PhD project SPOCK (Cifre scholarship) funded by the ANR Technology will consist in the development of a unified and standardised framework for quantitative metagenomics in a clinical/industrial context with as a main purpose identifying the key microorganismal players in the host-microbiome dialog and human health maintenance. The beneficiary of the PhD scholarship is Marianne Borderes.
ANR project GREEN
Coordinators: (General) Abdelaziz Heddi (Insa-Lyon), (in LBBE) Cristina Vieira; Participant in BAOBAB-ERABLE Teams LBBE-UCBL-INRIA: Marie-France Sagot.
Most insect pests thriving on nutritionally poor habitats have evolved obligate mutualistic relationships with heritable intracellular bacteria (endosymbionts) that supplement their diet with limiting nutrients and thereby improve their adaptive and invasive powers. The endosymbiont distribution is restricted to female germ cells and to the bacteriocytes, i.e. the specialised cells that seclude the bacteria and prevent their exposure to the host immune system. Remarkably, neither the host nor the endosymbiont can survive independently out of these integrated associations. Investigating the mechanisms by which insects maintain endosymbionts and control their number will participate to the identification of new specific targets of host-symbiont interaction and host homeostasis and fitness. By investigating the endosymbiotic association between the cereal weevil Sitophilus oryzae and the Gram-negative bacterium Sodalis pierantonius, we showed that bacteriocytes display a modulated expression of immune genes, notably marked by a down-regulation of most immune effectors, and that S. pierantonius undergoes a highly contrasted dynamics along the host life cycle. The endosymbionts load is controlled and adjusted to the host physiological and developmental needs through specific immune gene expression, cell apoptosis, and autophagy. The present project aims at unraveling the major host gene involved in the symbiosis homeostasis and endosymbiont dynamics, and at deciphering their mechanisms of regulation and function. We will decipher the molecular bases of the host-symbiont interactions at critical phases of the host development by using the dual-RNA-seq technology, which allows to simultaneously screen the transcriptomes of host and endosymbiont and to pinpoint their coordinated and contrasted gene expression. To go further into how the bacteriocyte immune response has evolved to express a limited set of immune effectors, and what are the regulatory elements behind this immunomodulation, we will identify cis-regulatory elements, non-coding RNAs, and candidate transcription factors acting as master regulators. Finally, we will analyze the function of selected candidate genes related to the bacteriocyte homeostasis or symbiosis dynamics during the host life cycle by combining complementary functional genomics tools, including in situ transcript and protein localisation, RNA interference transcript inhibition, and structure-activity analysis of candidate proteins.
By combining in silico and wet lab tools, we expect to provide a clear picture on the gene players and how they are regulated in both endosymbiosis homeostasis and along endosymbiont dynamics. We have the ambition to provide the foundation for identifying specific molecules disrupting the endosymbiotic relationship, as a novel control strategy for weevils and other major insect pests.
ANR project ASTER
Coordinators: (General) Hélène Touzet, Bonsai team, CRIStAL-INRIA, (in Lille) David Hot, Institut Pasteur de Lille, (in Lyon) Vincent Lacroix BAOBAB-ERABLE Teams LBBE-UCBL-INRIA, (in Paris) Jean-Marc Aury, CEA.
The ANR project ASTER proposes to develop algorithms and software for analysing third generation sequencing data. Third generation is an emerging technology for RNA and DNA sequencing that promises to give a better picture for studying transcriptomes, metagenomes and metatranscriptomes of all living organisms. It will be key for discovering new fundamental mechanisms in cell biology, with broad implications in environmental research, health and agriculture. However, analysing the data is computationally challenging due to a very high rate of sequencing errors. There is a pressing need for models and algorithms that can accommodate this new kind of data and that are also scalable.
ANR project GraphEn
Coordinators: (General) Dieter Kratsch, LITA, University of Lorraine, France, (in Clermont-Ferrand) Mamadou Moustapha Kanté, LIMOS, University of Clermont-Ferrand, France, (in Bordeaux) Paul Dorbec, LABRI, University of Bordeaux, France; Participant in BAOBAB-ERABLE Teams LBBE-UCBL-INRIA: Arnaud Mary.
The P vs. NP question is arguably the most important open question in Theoretical Computer Science these days. Under the widely believed assumption that the complexity classes P and NP are not equal, there are problems that cannot be solved efficiently with the help of computers. Thus it is important to identify such problems and to find other ways of dealing with them, different from the traditional means of polynomial-time algorithms. Unfortunately, many problems of great theoretical importance and also many problems that arise from real applications turn out to be intractable in the general case.
While optimisation is ubiquitous in Computer Science and a lot of research has been done on algorithms and complexity on optimisation problems, surprisingly little attention has been given to enumeration. A solution of the enumeration version of a problem typically provides an immediate solution for the optimisation version of the problem. This seems to suggest that enumeration is “much harder” than optimisation, which, among others, directed the search for tractability and efficient algorithms to optimisation problems. New insights from the recent research on the exact complexity of hard problems indicate that the relation between enumeration and optimisation is more subtle and worth a fundamental study from theoretical point of view.
Listing, generating or enumerating objects of specified type and properties has important applications in various domains of Computer Science as e.g. data mining, machine learning and artificial intelligence, as well as in other sciences, in particular in biology, and also many applications in real life. This is one of the motivations of our interest in enumeration. The scientific goals of the project are of theoretical nature and oriented towards better understanding of the complexity of enumeration and the study of algorithmic techniques to solve enumeration problems. This project will concentrate focus on problems for graphs and hypergraphs and study three different approaches to the algorithmics of enumeration.
Information on older projects of the team may be found here. Notice that this page remains uncomplete.