Software

Libraries or management tools for high throughput sequencing data

  • GATB Library. The Genome Analysis Toolbox with de-Bruijn graph. A large part of tools developed by the GenScale team are based on this library.
    These methods enable the analysis of data sets of any size on multi-core desktop computers, including very huge amount of reads data coming from any kind of organisms such as bacteria, plants, animals and even complex samples (e.g. metagenomes). Among them are (the full is available here: https://gatb.inria.fr/software/):
  • LRez: C++ Library and toolkit for the barcode-based management and indexation of linked-read datasets.

Variant calling and/or genotyping

  • DiscoSNP++ and discoSnpRAD: Reference-free small variant discovery (SNPs and indels)
  • MindTheGap: Detection and assembly of large insertion variants
  • TakeABreak: reference-free inversion discovery tool
  • SVJedi: Structural Variant genotyper with long read data
  • SVJedi-graph: Structural Variant genotyper with long read data using a variation graph

Sequence assembly

  • MinYS: reference-guided genome assembly in metagenomics data
  • MTG-link: local assembly tool for linked-read data
  • Minia: De novo short read assembler
  • de-novo pipelinede-novo assembly pipeline (error correction / contigs / scaffolding) for genomes and meta-genomes
  • Mapsembler2: Targeted assembly (not maintained)

Managing k-mers & indexation

  • findere: simple strategy for speeding up queries and for reducing false positive calls from any Approximate Membership Query data structure.
    • fimpera extends findere adding the abundance information.
  • kmtricks: modular tool suite for counting kmers, and constructing Bloom filters or kmer matrices, for large collections of sequencing data.
  • kmindex is a tool for indexing and querying sequencing samples. It is built on top of kmtricks.
  • back to sequences: Find sequences (reads, unitigs, genes) related to a set of kmers in large datasets, in a matter of seconds.
  • Backpack Quotient Filter: k-mer indexing data structure with abundance
  • short read connector: Detect similar reads from potentially large read set
  • DSK: Count K-mer in sequences

Pangenome graph manipulation

  • Pancat: Pangenome Comparison and Analysis Toolkit
  • GFAGraphs: a Python library to handle pangenome graph files in GFA format.

Comparative metagenomics with k-mers

Species and bacterial strains identification

  • ORI: software using long nanopore reads to identify bacteria present in a sample at the strain level
  • StrainFLAIR: STRAIN-level proFiLing using vArIation gRaph

General-purpose sequencing data manipulation

  • GASSST: long read mapper
  • Leon: short read compressor (now included in GATB-core)
  • Bloocoo: short read corrector
  • BCALM: Construct compacted de Bruijn graphs (unitigs)

 Protein Structure

  • A_Purva: Contact Map Overlap solver
  • MD-Jeep: Distance Geometry solver
  • CSA: Comparative Structural Alignment

Workflow

  • SLICEE: parallel execution of bioinformatics workflows

Comparative Genomics

  • CASSIS: detection of rearrangement breakpoints
  • PLAST: intensive bank-to-bank sequence comparison
  • DRJBreakpointFinder: detection and precise localization of excision sites in proviral segments

Permanent link to this article: https://team.inria.fr/genscale/software-2/