This page presents a partial list of the projects in which I participated or which I coordinated.
- Current projects
- Past projects (from 2009 on)
H2020 Twinning Project OLISSIPO
Coordinator: Coordinator: Susana Vinga, INESC-ID, Instituto Superior Técnico, Lisbon, Portugal; Coordinator in ERABLE Team at Inria: Marie-France Sagot.
OLISSIPO is a H2020 Twinning project coordinated by INESC-ID. The project focuses on Computational Biology, a strongly interdisciplinary area that combines Computer Science, Algorithms, Mathematics, Probability and Statistics, Machine Learning, Molecular Biology and Medicine. The project consortium is composed of INESC-ID (Coordinator), the National Institute for Research in Digital Science and Technology (Inria) through the Erable team, the Swiss Federal Institute of Technology (ETH Zurich) and the European Molecular Biology Laboratory (EMBL). The main goal of the project is to intensify, increase and consolidate the research in Computational Biology carried out at INESC-ID in partnership with the European partner institutions.
H2020 ITN Project ALPACA
Coordinator: Coordinator: Alexander Schönhuth, University of Bielefeld, Germany.
ALPACA (“Algorithms for PAngenome Computational Analysis”) is an EU Marie Skłodowska-Curie Innovative Training Networks (ITN) consortium grant that will train 14 PhD researchers, including one at CWI who will be co-supervised by Solon Pissis and Leen Stougie, together also with Marie-France Sagot, and one at the University of Pisa supervised by Nadia Pisanti.
Genomes are strings over the letters A, C, G, T, which represent nucleotides, the building blocks of DNA. In view of ultra-large amounts of genome sequence data emerging from ever more and technologically rapidly advancing genome sequencing devices—in the meantime, amounts of sequencing data accrued are reaching into the exabyte scale—the driving, urgent question is: how can we arrange and analyse these data masses in a formally rigorous, computationally efficient and biomedically rewarding manner? Graph based data structures have been pointed out to have disruptive benefits over traditional sequence-based structures when representing pan-genomes, sufficiently large, evolutionarily coherent collections of genomes. This project will put this shift of paradigms—from sequence to graph based representations of genomes—into full effect.
The project consortium consists of 23 academic and industrial partners, including CWI, the University of Bielefeld (Germany), the CNRS (France), the INRIA (France), the University of Pisa (Italy), the University of Milan-Bicocca (Italy), the Heinrich Heine University in Düsseldorf (Germany), the European Molecular Biology Laboratory (EMBl-EBI), the Comenius University in Bratislava (Slovakia), the University of Helsinki (Finland), the Pasteur Institute (France), and the University of Cambridge (UK).
Inria Associated Team CAPOEIRA
Coordinators: Coordinators: Marie-France Sagot (Erable); André Fujita (Universidade de São Paulo (USP), São Paulo, Brazil).
The project covers theoretical computer science (essentially graph theory), mathematics (combinatorics, statistics, and probability), and the development of algorithms to address various biological questions, in particular, the intra and cross-species interactions, which have implications in all aspects of life sciences, including health, ecology, and environment.
Two main general topics will be addressed, namely evolution/co-evolution, and biological network (graph/hypergraph) analysis and comparison. The first topic concerns better understanding and characterising the moment of speciation leading to new species on one hand, and on the other, how one set of species may influence the evolution of another. The second topic concerns metabolism on one hand, and (post-)transcriptional regulation on the other, with the post-transcriptional level involving also inference “from scratch” of the main actors, namely the non-coding RNAs and their targets, and the regulatory network they form. In the first two cases (of metabolism and transcriptional regulation), we will assume that the networks are already inferred albeit with possibly numerous missing and incorrect data. Finally, in the case of regulation, we will also consider the problem of inferring variants, notably related to alternative splicing, from a set of RNA-seq data using a de Bruijn graph approach. Overseeing these two main topics are the issues of knowledge representation and model revision that will also be addressed. These are crucial in the life sciences, and notably in the context of post-transcriptional regulation by non-coding RNAs, for which the different actors, features, and overall mechanisms are constantly being questioned and revised.
Capes-Cofecub Project AHIMSA
Coordinators: Coordinators: Marie-France Sagot (Erable); Andréa Ávila (Instituto de Biologia Molecular do Paraná – Fiocruz-PR, Curitiba, Paraná, Brazil).
One of the objectives of this project, the one with highest risk of not succeeding, is to explore Ahimsa-like approaches to sickness and health. However, our main objective is to first understand how these organisms respond to drug treatments or reshape the host cells after infection. We further aim to focus on a community-vision approach to living organisms which will try to gather information from multiple partners of the biological systems we are studying. This presents further risks that are both methodological and experimental. Modelling communities up to the molecular level is indeed hard because of a lack of enough or of adequate data. Modelling and then experimentally manipulating such communities is tricky also because of the complexity of having to handle many different processes taking place at very different levels and time scales.
Inria Associated Team COMPASSO
Coordinators: Marie-France Sagot, Erable, Inria; Susana Vinga, Instituto Superior Técnico (IST), Lisbon, Portugal.
Microbial communities are ubiquitous in nature and have major impact on every aspect of life in our planet. In spite of its importance, little is known about the principles that determine the functioning, robustness, evolution and control of such communities. The two teams that are partners of this project have some history of collaborating together. So far however, their main direct scientific concerns have been distinct in terms of final goals, while the areas of expertise are concentrated on computer science but with also some distinct characteristics. The main aim of this project is to work together towards establishing a strong link between the different application goals each team has had so far. This is an ambitious project, that will highly depend on further blending together the different expertises that each team has.
The French team has since some ten years now been highly interested in modelling and exploring species interactions. Such interactions indeed appear crucial to understand some if not all of the most fundamental evolutionary and functional questions related to living organisms. They remain however very little explored by computational biologists.
The Portuguese team on the other hand has been involved since a few years in a number of projects related mainly to cancer and rare diseases. The objective has been to develop the statistical and machine learning algorithms that would allow, using multi-omic data, to help propose personalised treatments to these diseases.
The ultimate aim of this project is to start building links between these two aspects, of species interactions and cancer/rare diseases, or more precisely, between infectious diseases and non infectious ones, whether they involve human or animals more in general. The main general questions that will be addressed are the following: (i) Are species interactions really a crucial factor on the development of at least some non infectious diseases as is suspected? (ii) If yes, could this disease be treated in a “non-aggressive” way by exploiting such species interactions? These are highly ambitious questions that will in the first three years be tackled through two angles. One concerns modelling and understanding the system biology of communities, and the second modelling and understanding the co-evolutionary aspects present in such communities. first will in fact cover both synthetic communities and natural ones.
ANR project GREEN
Coordinators: (General) Abdelaziz Heddi (Insa-Lyon), (in LBBE) Cristina Vieira; Participant in BAOBAB-ERABLE Teams LBBE-UCBL-INRIA: Marie-France Sagot.
Most insect pests thriving on nutritionally poor habitats have evolved obligate mutualistic relationships with heritable intracellular bacteria (endosymbionts) that supplement their diet with limiting nutrients and thereby improve their adaptive and invasive powers. The endosymbiont distribution is restricted to female germ cells and to the bacteriocytes, i.e. the specialised cells that seclude the bacteria and prevent their exposure to the host immune system. Remarkably, neither the host nor the endosymbiont can survive independently out of these integrated associations. Investigating the mechanisms by which insects maintain endosymbionts and control their number will participate to the identification of new specific targets of host-symbiont interaction and host homeostasis and fitness. By investigating the endosymbiotic association between the cereal weevil Sitophilus oryzae and the Gram-negative bacterium Sodalis pierantonius, we showed that bacteriocytes display a modulated expression of immune genes, notably marked by a down-regulation of most immune effectors, and that S. pierantonius undergoes a highly contrasted dynamics along the host life cycle. The endosymbionts load is controlled and adjusted to the host physiological and developmental needs through specific immune gene expression, cell apoptosis, and autophagy. The present project aims at unraveling the major host gene involved in the symbiosis homeostasis and endosymbiont dynamics, and at deciphering their mechanisms of regulation and function. We will decipher the molecular bases of the host-symbiont interactions at critical phases of the host development by using the dual-RNA-seq technology, which allows to simultaneously screen the transcriptomes of host and endosymbiont and to pinpoint their coordinated and contrasted gene expression. To go further into how the bacteriocyte immune response has evolved to express a limited set of immune effectors, and what are the regulatory elements behind this immunomodulation, we will identify cis-regulatory elements, non-coding RNAs, and candidate transcription factors acting as master regulators. Finally, we will analyze the function of selected candidate genes related to the bacteriocyte homeostasis or symbiosis dynamics during the host life cycle by combining complementary functional genomics tools, including in situ transcript and protein localisation, RNA interference transcript inhibition, and structure-activity analysis of candidate proteins.
By combining in silico and wet lab tools, we expect to provide a clear picture on the gene players and how they are regulated in both endosymbiosis homeostasis and along endosymbiont dynamics. We have the ambition to provide the foundation for identifying specific molecules disrupting the endosymbiotic relationship, as a novel control strategy for weevils and other major insect pests.
ANR project ASTER
Coordinators: (General) Hélène Touzet, Bonsai team, CRIStAL-INRIA, (in Lille) David Hot, Institut Pasteur de Lille,(in Lyon) Vincent Lacroix BAOBAB-ERABLE Teams LBBE-UCBL-INRIA, (in Paris) Jean-Marc Aury, CEA.
The ANR project ASTER proposes to develop algorithms and software for analysing third generation sequencing data. Third generation is an emerging technology for RNA and DNA sequencing that promises to give a better picture for studying transcriptomes, metagenomes and metatranscriptomes of all living organisms. It will be key for discovering new fundamental mechanisms in cell biology, with broad implications in environmental research, health and agriculture. However, analysing the data is computationally challenging due to a very high rate of sequencing errors. There is a pressing need for models and algorithms that can accommodate this new kind of data and that are also scalable.
Past projects (from 2009 on)
This list is incomplete. It contains only the projects that started from 2009.
Coordinator: Marie-France Sagot, BAMBOO, Inria (sole partner)
ERC Advanced Grant SISYPHE
Coordinator: Marie-France Sagot, BAMBOO that then became ERABLE, Inria (sole partner)
Symbiosis, or at least its extent, role and precise nature are controversial but symbiosis appears also essential to understand some of the most fundamental evolutionary and functional questions related to living organisms. Nevertheless, although symbiotic relationships have been studied by biologists since probably the early 19th century, they remain little studied by computational biologists. Yet the enormous variety in the observed types of pair- and multi-wise symbiotic relations, and the fact that these relationships touch upon almost every aspect of biology, from molecular to ecological, raise formidable mathematical and computational issues. Addressing some of the main such issues to arrive at a better understanding of the processes of “acquisition and maintenance of one or more organisms by another” and of the (co)evolution of “novel structures and metabolism” with which such processes are associated is the purpose of this project. The approach proposed blends a mathematical (combinatorial, statistical) exploration of the huge variety of genomic and biochemical landscapes observed in the symbiont world, and at the interface between symbionts and hosts or of both and their environment, together with wet-lab experiments.
INRIA International Partnership AMICI
Coordinator: Marie-France Sagot, BAMBOO, Inria, with Universities of Florence, Pisa and Rome La Sapienza in Italy, and Free University and CWI Amsterdam in the Netherlands
The Inria International partnership AMICI followed up from the INRIA Associated Team SIMBIOSI that started in January of 2009 and ended in December of 2011. The three coordinators of SIMBIOSI were Marie-France Sagot for the INRIA, Alberto Marchetti-Spaccamela (University of Rome La Sapienza) and Leen Stougie (Free University of Amsterdam and CWI).
The members of AMICI were the three above, together with the whole of BAMBOO, plus:
- Nadia Pisanti and Roberto Grossi from the University of Pisa, Italy, with whom we have been collaborating since the PhD of Nadia Pisanti in 2002′
- Pierluigi Crescenzi from the University of Florence, Italy, who joined the Associated Team SIMBIOSI as collaborator early on in its creation (in mid-2009). He is also since March 2011 co-supervisor of two PhD students (Gustavo Sacomoto and Beatrice Donati) with Marie-France Sagot (PhDs funded by the ERC AdG Sisyphe).
BAMBOO proposed an evolution of AMICI towards another structure, that of a European Inria Project-Team called ERABLE currently under way. ERABLE exists already since January 1st, 2015, as an Inria “Center-Team”.
Coordinator: (General) Pierre Peterlongo GenScale INRIA, (in Lyon) Vincent Lacroix BAMBOO INRIA-LBBE-UCBL, (in Montpellier) Eric Rivals LIRMM
The main goal of the Colib’read was to design new algorithms dedicated to the extraction of biological knowledge from raw (non assembled) data produced by High Throughput Sequencers (HTS), also called Next Generation Sequencers (NGS).
A few years ago, genomics witnessed an unprecedentedly deep change with the advent of High Throughput Sequencing (HTS), also known as Next Generation Sequencing (NGS). These technologies generate data of a new type in huge volumes. Crucial computational developments are needed to take full advantage of these data. Our project proposed an original way of extracting information from such data. Usually, a generic assembly (pretreatment) is applied to the data, and then, in a second step, any information of interest is extracted. Our aim was to avoid this protocol that leads to a significant loss of information, or generates chimerical results because of the heuristics used in the assembly. Instead, we developed a set of innovative methods for extracting information of biological interest from HTS data that bypass any costly and often inaccurate assembly step. Importantly, the developed methods do not require the availability of a reference genome. This broadened considerably the spectrum of applications of our methods. Shortly, for each biological question, our general approach consisted in 1) defining a model for the searched elements; 2) detecting in one or several HTS datasets those elements that fit the model; 3) outputting those together with a score and their genomic neighbourhood. From a computational viewpoint, our proposal relied on a formal model based on the De-Bruijn graph structure to develop algorithms able to handle a huge amount of data. Among others, Colib’read delivered algorithms based on the De-Bruijn graph, and tools validated by biologists.
This project was at the interface between (i) fundamental computational questions, (ii) algorithmic developments including the design of ad-hoc indexes and parallelisation, and (iii) biological applications for validation. Finally (iv) it also proposed a large public and educational dissemination.
More information on it may be found here.
FP7 KBBE BacHBerry
The FP7 KBBE BacHBerry project (“BACterial Hosts for production of Bioactive phenolics from bERRY fruits”) started in November 2013 and ended in October 2016.
Coordinator: Jochen Förster, Novo Nordisk Foundation Center for Biosustainability (CFB), Copenhagen, Danemark
The main objective of BacHBerry was to develop innovative methodologies for tapping the commercial potential of plant metabolites, namely phenolic compounds in berry fruits, overcoming current scientific and technological barriers in the field of bio-industry, for the generation of bacterial platforms for sustainable, bio-based production of the desired plant metabolites.
Plants synthesize a staggering variety of secondary metabolites, and this chemodiversity is a poorly used pool of natural molecules with bioactive properties of importance for applications in the pharma and food industries. BacHBerry focused on phenolic compounds, a large and diverse class of plant metabolites, which are currently in the spotlight due to their claimed beneficial effects in prevention and treatment of chronic diseases, but that also have applications as cosmetics, flavours and food colorants etc. Berries are soft and colourful fruits, with great diversity, high content and unique profiles in phenolic compounds, making them a major source of these high-value metabolites. The BacHBerry project aimed to develop a portfolio of sustainable methodologies to mine the potential of the untapped biodiversity of the bioactive phenolic compounds in an extensive collection of berry species. Full exploitation of this unrivalled natural resource requires an integrated and comprehensive effort from bioprospecting in berries using SMART high-throughput screens for the valorisation of phenolic bioactivities aligned with their identification using cutting edge analytics and subsequent elucidation of their biosynthetic pathways. This knowledge will facilitate metabolic engineering of suitable bacterial hosts for high-value phenolics production in scalable fermentation bioprocesses, ultimately serving as commercial production platforms. The consortium comprised a full chain of research and innovation, with 12 research groups, 5 SMEs and a large enterprise, representing 10 countries including partners from ICPC countries Russia, Chile and China, with the capacity to exploit novel bioactivities from berry fruits diversity. BacHBerry developed a pipeline of sustainable and cost-effective processes to facilitate production of added-value berry phenolics with immediate potential for commercialisation and consequent socio-economic benefits for the European community and beyond.
For more information on BacHBerry, see http://www.bachberry.eu/.
Inria Associated Team ALEGRIA
Coordinator: (France) Marie-France Sagot, ERABLE Team, Inria and (Brazil) Andrea Ávila, Instituto de Biologia Molecular do Paraná – Fiocruz-PR, Paraná.
Parasitic protists include agents of human and animal diseases that have a huge impact on world populations and economy. The major public health problems of protozoan organisms come mainly from the phylum Apicomplexa or the Class Kinetoplastida (from the phylum Euglenozoa).
An important subject yet largely under-explored is the fact that most members from these groups are pathogenic while a small fraction is not, which raises the question of what gives origin to the pathogenicity of these parasites. This is the main question we wish to address by means of computational methods and wet-lab experiments.
Stic AmSud project MAIA
Coordinators: (France) Marie-France Sagot, ERABLE Team, Inria; (Brazil) Roberto Marcondes César Jr, Instituto de Matemática e Estatística, Universidade de São Paulo; and Paulo Vieira Milreu, TecSinapse; (Chile) Vicente Acuña, Centro de Modelamiento Matemático, Santiago; and Gonzalo Ruz, University Adolfo Ibañez, Santiago.
This project has two main goals: one methodological that aims to explore how accurately hard problems can be solved theoretically by different approaches – exact, approximate, randomised, heuristic – and combinations thereof, and a second that aims to better understand the extent and the role of interspecific interactions in all main life processes by using the methodological insights gained in the first goal and the algorithms developed as a consequence.
H2020-MSCA-ETN-2014 project MicroWine
Coordinator: Lars Hestbjerg Hansen, Department of Environmental Science – Enviromental microbiology & biotechnology, Aarhus University, Aarhus, Danemark.
A diverse, complex, and poorly characterised community of microorganisms lies at the heart of the wine. These microorganisms play key roles at all stages of the viniculture and vinification processes, from helping the plants access nutrients from the soil, driving the plants’ health through protection against pathogens, to the fermentation process that transforms the must into wine with its complex array of aromas and flavours.
The main aim of MicroWine is to gain an improved understanding of such microbial community and of its interplay with the wine.
CNRS-UCBL-INRIA International Associated Laboratory LIRIO
Coordinators: (France) Marie-France Sagot, ERABLE-BAOBAB LBBE and (Brazil) Ana Tereza Ribeiro de Vasconcelos, Labinfo LNCC
Duration: 2012-2015, renewed 2016-2019
The CNRS-UCBL-INRIA International Associated Laboratory (Laboratoire International Associé – LIA) LIRIO builds upon a strong collaboration between the team of a French-Brazilian researcher with a background in discrete mathematics and algorithmics for the life sciences who has made her scientific career in France, since 2001 in the Laboratoire de Biométrie et Biologie Évolutive UMR 5558, and the team of a Brazilian researcher with a background in genetics and bioinformatics, and extensive national and international links in the area of bioinformatics. The research that will be conducted in the LIA will concern putting together all the activities currently conducted by each team separately or that each team has already planned to do, but also new research that the synergy between the two teams will enable to address in future. This synergy represented by the LIA should also allow us to apply for other sources of funding to support the research we wish to develop. Initially, this research will be concentrated on two main axes, one strongly concerned with the host-parasite relationship and the second with micro-environmental genomics and systems biology. Both address complex systems by a broad variety of experimental, bioinformatic and algorithmic approaches that reflect the complementarity of the two teams involved (biology including experimental part for the Brazilian team, algorithmics for the French one) while bioinformatics is a common language between the two. Besides fundamental issues, the two axes may have also important health-related implications. The topics in these two axes belong to one of the five “thématiques au cœur de l’INEE”, namely “Biodiversité et écologie fonctionnelle”, and cover three “thèmes d‘interface”, namely “Biodiversité, structure, dynamique et fonctionnalité”, “Mécanismes d’adaptation et d’évolution” and “Environment et santé”. Training will represent another key aspect of the LIA, and will aim at extending the already intensive exchanges of researchers, Master and PhD students between the two French and Brazilian partners of the LIA. The bioinformatic aspect of the two axes of research, both sequencing and data analysis, will also greatly benefit from an interaction between the platforms with which each partner is involved in her own country.
Associated with LIRIO, there are also a number of projects whose description may be found here.