Software – PLEIADE

Highlights of our work in the Inria catalog of research software:

Biodiversiton
- Functional description:
  
  Biodiversiton is a suite of tools for biodiversity composed by Rsyst, pairwise_dis, diagno_syst, and yapotu. The global project provides tutorials, datasets, and a readme for the whole suite.
- Scientific description:
- Privileged contact:
  
  Alain Franc (alain.franc@inria.fr)
- Participants:
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/biodiversiton
Rsyst
- Functional description:
  
  Contains the R-Syst databases, in sqlite format, as well as python programs for querying them through a python interface for the most common queries.
- Scientific description:
- Privileged contact:
  
  Alain Franc (alain.franc@inria.fr)
- Participants:
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/metabarcoding/rsyst
disseq
- Functional description:
  
  C and MPI routines for calculating pairwise distances between DNA sequences:
  – disseq, standalone calculation for several thousand sequences on a laptop,
  – mpi-disseq: version for distributed memory computation with MPI, with a clock-like scaling from a laptop to a national centre (such a matrix for 1 000 000 sequences has been computed)
- Scientific description:
- Privileged contact:
  
  Jean-Marc Frigerio (jean-marc.frigerio@inria.fr)
- Participants:
  
  Philippe Chaumeil (philippe.chaumeil@inria.fr), Alain Franc (alain.franc@inria.fr), Jean-Marc Frigerio (jean-marc.frigerio@inria.fr), Franck Salin (franck.salin@inria.fr), Sylvie Thérond (sylvie.therond@idris.fr)
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/biodiversiton/disseq
pydiodon
- Functional description:
  
  Most dimension reduction methods inherited from Multivariate Data Analysis, and currently implemented as elements in statistical learning for handling very large datasets (meaning the dimension of spaces is the number of features), rely on a chain of pretreatments, a core with a SVD for low rank approximation of a given matrix, and a post-treatment for interpreting results. The costly part in computations is the SVD, which is in cubic complexity. Diodon is a list of functions and drivers which implement (i) pre-treatments, SVD and post-treatments on a large diversity of methods, (ii) random projection methods for running the SVD which permits to bypass the time limit in computing the SVD, and (iii) an implementation in C++ of the SVD with random projection at prescribed rank or precision, connected to MDS.
  
  Pydiodon is a deliverable of the ADT Diodon (see https://gitlab.inria.fr/diodon) which will provide an API in python (pydiodon) and C++ (cppdiodon), the former developed by Pleiade with the SED, the latter developped by the SED with Hiepacs (connections with FMR).
- Scientific description:
- Privileged contact:
  
  Alain Franc (alain.franc@inria.fr)
- Participants:
  
  Alain Franc (alain.franc@inria.fr), Jean-Marc Frigerio (jean-marc.frigerio@inria.fr), Franck Salin (franck.salin@inria.fr), Florent Pruvost (Florent.Pruvost@inria.fr)
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/diodon/pydiodon
Yapotu
- Functional description:
  
  The main functionalities are as follows:
  1) building OTUs from a fasta file (swarm, vsearch, ..) or a distannce file (yapotu) for an environmenal sample
  2) building a fasta file and a distance file per OTU
  3) checking the consistency of the OTUs by displaying them as a graph (see OTU as a graph below)
  4) displaying the shape of an OTU or of a set of OTUs by Multidimensional Scaling
  5) implementing Hierachical Aggregative Clustering of an OTU or a set of OTUs with various aggregation methods
- Scientific description:
- Privileged contact:
  
  Alain Franc (alain.franc@inria.fr)
- Participants:
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/biodiversiton/yap
Metage2Metabo
- Functional description:
  
  Metabolic networks are graphs which nodes are compounds and edges are biochemical reactions. To study the metabolic capabilities of microbiota, Metage2Metabo uses multiprocessing to reconstruct metabolic networks at large-scale. The individual and collective metabolic capabilities (number of compounds producible) are computed and compared. From these comparisons, a set of compounds only producible by the community is created. These newly producible compounds are used to find minimal communities that can produce them. From these communities, the keytstone species in the production of these compounds are identified.
- Scientific description:
  
  Flexible pipeline for the metabolic screening of large scale microbial communities described by reference genomes or metagenome-assembled genomes.
  The pipeline comprises several main steps.
  (1) Automatic and parallel reconstruction of metabolic networks.
  (2) Computation of individual metabolic potentials
  (3) Computation of collective metabolic potential
  (4) Calculation of the cooperation potential described as the set of metabolites producible by species only in a cooperative context
  (5) Computation of minimal-sized communities sastifying a metabolic objective
  (6) Extraction of key species (essential and alternative symbionts) associated to a metabolic function
- Privileged contact:
  
  Clemence Frioux (clemence.frioux@inria.fr)
- Participants:
  
  Clemence Frioux (clemence.frioux@inria.fr), Arnaud Belcour (arnaud.belcour@irisa.fr), Anne Siegel (Anne.Siegel@irisa.fr)
- Structures:
  
  DYLISS, PLEIADE
- Website:
  
  https://github.com/AuReMe/metage2metabo
MiSCoTo
- Functional description:
  
  Metabolic networks are composed of biochemical reactions and gather the expected metabolic capabilities of species. For organisms that live in interaction altogether (microbiotas), complementarity between these networks can be exploited to predict cooperation events. This software takes as inputs metabolic networks for various species (host, symbionts of the microbiota), components of the growth medium and a metabolic objective (metabolites to be produced), and aims at selecting a minimal set of symbionts to ensure the metabolic objective can be achieved. The software can use two types of modelings: a simplified one and another that takes into account the cost of metabolic exchanges and aims at minimizing it.
- Scientific description:
  
  MiSCoTo solves combinatorial problems using Answer Set Programming. It aims at minimizing either the number of selected species or both the number of selected species and the cost of the interaction between them, characterized by the number of metabolic exchanges. In the first case, the level of modeling is called lumped or mixed-bag, in the latter, it is compartmentalized.
- Privileged contact:
  
  Clemence Frioux (clemence.frioux@inria.fr)
- Participants:
  
  Clemence Frioux (clemence.frioux@inria.fr), Anne Siegel (anne.siegel@irisa.fr), Enora Fremy (enora.fremy@irisa.fr), Camille Trottier, Arnaud Belcour (arnaud.belcour@irisa.fr)
- Structures:
  
  DYLISS, PLEIADE
- Website:
  
  https://github.com/cfrioux/miscoto
MeneTools
- Functional description:
  
  MeneTools consist in four topological tool to analyze metabolic models in a graph-based perspective.
  Menecheck verifies the producibility of target compounds from available substrates (growth medium) of the metabolic network.
  Menescope gives the whole range of accessible compounds in the metabolic network starting from substrates.
  Menepath give the production paths of given compounds in the model.
  Menecof proposes compounds that need to be produced or added as substrate for ensuring the producibility of targets.
- Scientific description:
  
  MeneTools are a set of tools for the exploration of the producibility potential in a metabolic network using the network expansion algorithm. The MeneTools can:
  – assess whether targets are producible starting from nutrients (Menecheck)
  – get all compounds that are producible starting from nutrients (Menescope)
  – get all reactions that are activable from nutrients (Meneacti)
  – get production paths of specific compounds (Menepath)
  – obtain compounds that if added to the nutrients, would ensure the producibility of targets (Menecof)
  – identify metabolic deadends, i.e. metabolites that act as reactants of reactions but never as products, or metabolites that act as products of reactions but never as reactants. This is a purely structural analysis.
  All MeneTools using modelling follow the producibility in metabolic networks as defined by the network expansion algorithm.
- Privileged contact:
  
  Clemence Frioux (clemence.frioux@inria.fr)
- Participants:
  
  Clemence Frioux (clemence.frioux@inria.fr), Anne Siegel (anne.siegel@irisa.fr), Arnaud Belcour (arnaud.belcour@irisa.fr)
- Structures:
  
  DYLISS, PLEIADE
- Website:
  
  https://github.com/cfrioux/MeneTools
Fluto
- Functional description:
  
  Fluto relies on Answer Set Programming (ASP) and a hybrid modelling that associates to ASP a Linear Programming (LP) constraint propagator. Models satisfying the qualitative constraints of network expansion are tested for satisfiability of flux constraints with the LP propagator. Resulting answer sets permit the completion of a metabolic network that ensures the metabolic reaction of interest is activated according to both formalisms.
- Scientific description:
  
  Fluto performs metabolic network completion with respect to topological and linear reaction rate constraints based on the stoichiometry of metabolic reactions.
- Privileged contact:
  
  Clemence Frioux (clemence.frioux@inria.fr)
- Participants:
  
  Sven Thiele
- Structures:
  
  DYLISS, PLEIADE
- Website:
  
  https://github.com/cfrioux/fluto/
Alcyone
- Functional description:
  
  Alcyone defines a file structure for the specifying bioinformatics analysis environments, including tool choice, interoperability, and sources of raw data. These specifications are recorded in a Git repository. Alcyone compiles a specification into a master Docker container that deploys and orchestrates containers for each of the component tools. Alcyone can restore any version of an environment recorded in the Git repository.
- Scientific description:
  
  Alcyone conceives the user's computing environment as a microservices architecture, where each bioinformatics tool in the specification is a separate containerized Docker service. Alcyone builds a master container for the specified environment that is responsible for building, updating, deploying and stopping these containers, as well as recording and sharing the environment in a Git repository. The master container can be manipulated using a command-line interface.
- Privileged contact:
  
  David Sherman (david.sherman@inria.fr)
- Participants:
  
  Louise-Amelie Schmitt (louise-amelie.schmitt@inria.fr), David Sherman (david.sherman@inria.fr)
- Structures:
  
  PLEIADE
- Website:
  
  https://team.inria.fr/pleiade/alcyone/
magecal
- Functional description:
  
  Magecal predicts a set of protein coding genes in fungal genomic sequences, using different de novo prediction algorithms, and reconciling the predictions with the aid of comparative data. Magecal applies consistency constraints to guarantee that the predicted genes are biologically valid.
- Scientific description:
  
  Magecal independently runs training and prediction steps for Augustus, Conrad, GeneID, GeneMark, and Snap. The results are cleaned and integrated into a common format. Jigsaw is trained and used for model reconciliation. Consistency constraints are applied to ensure that phase and intron structure are biologically plausible.
- Privileged contact:
  
  David Sherman (david.sherman@inria.fr)
- Participants:
  
  Pascal Durrens (pascal.durrens@inria.fr), David Sherman (david.sherman@inria.fr)
- Structures:
  
  MAGNOME, PLEIADE
- Website:
  
  https://gitlab.inria.fr/magecal/magecal
family-3d
- Functional description:
  
  Family-3D lays out high-dimension protein family point clouds in 3D space. The resulting lower-dimension forms can be printed, so that they can be explored and compared manually. They can also be explored interactively or stereographically.
  
  Comparison of the 3D forms reveals classes of structurally similar families, whose characteristic shapes correspond to different evolutionary scenarios. Some of these scenarios are: neofunctionalization, subfunctionalization, founder gene effect, ancestral family.
  
  To facilitate curator training, Family-3D includes an interactive terminal containing a microcontroller, an RFID reader, and an LED ring. A set of shapes that fall in predetermined classes is printed, with a unique RFID tag in each shape. Trainees classify family shapes by manual inspection and submit their classes to the terminal, which evaluates the proposed class and provides visual feedback.
- Scientific description:
  
  The method statistically selects a subset of pairwise distances between proteins in the family, constructs a weighted graph, and lays it out using an adaptation of the three-dimensional extension of the Kamada-Kawai force-directed layout.
- Privileged contact:
  
  David Sherman (david.sherman@inria.fr)
- Participants:
  
  David Sherman (david.sherman@inria.fr)
- Structures:
  
  PLEIADE
- Website:
  
  https://gitlab.inria.fr/pleiade/family-3d
Diagno-Syst
- Functional description:
  
  Diagno-syst builds accurate inventories for biodiversity. It performs supervised clustering of reads obtained from a next-generation sequencing experiment, mapping onto an existing reference database, and assignment of taxonomic annotations.
- Scientific description:
- Privileged contact:
  
  Alain Franc (alain.franc@inria.fr)
- Participants:
  
  Alain Franc (alain.franc@inria.fr), Jean-Marc Frigerio (jean-marc.frigerio@inria.fr), Philippe Chaumeil (philippe.chaumeil@inria.fr), Franck Salin (franck.salin@inria.fr)
- Structures:
  
  PLEIADE
- Website: