Softwares

  • BlockCluster

  • cfda



    • The R package cfda performs:

      – descriptive statistics for categorical functional data

      – dimension reduction and optimal encoding of states (correspondance multiple analyses towards functional data)

      – approximation for multivariate categorical functional data analysis.



    • https://github.com/modal-inria/cfda
  • clere

  • Clustericat



    • Clustericat is an R package for model-based clustering of categorical data. In this package, the Conditional Correlated Model (CCM), published in 2014, takes into account the main conditional dependencies between variables through extreme dependence situations (independence and deterministic dependence). Clustericat performs the model selection and provides the best model according to the BIC criterion and the maximum likelihood estimates.


    • https://r-forge.r-project.org/R/?group_id=1803
  • CoModes



    • CoModes is another R package for model-based clustering of categorical data. In this package, the Conditional Modes Model (CMM), submitted for publication in 2014, takes into account the main conditional dependencies between variables through particular modality crossings (so-called modes). CoModes performs the model selection and provides the best model according to the exact integrated likelihood criterion and the maximum likelihood estimates.


    • https://r-forge.r-project.org/R/?group_id=1809
  • CorReg



    • The main idea of the CorReg package is to consider some form of sub-regression models, some variables defining others. We can then remove temporarily some of the variables to overcome ill-conditioned matrices inherent in linear regression and then reinject the deleted information, based on the structure that links the variables. The final model therefore takes into account all the variables but without suffering from the consequences of correlations between variables or high dimension.


    • https://cran.r-project.org/web/packages/CorReg/index.html
  • FunHDDC

  • FunFEM

  • Galaxy – MPAgenomics



    • Galaxy is an open, web-based platform for data intensive biomedical research. Galaxy features user friendly interface, workflow management, sharing functionalities and is widely used in the biologist community. The MPAgenomics R package developped by MODAL has been integrated into Galaxy, and the Galaxy MODAL instance has been publicly deployed thanks to the IFB-cloud infrastructure.


    • https://cloud.france-bioinformatique.fr/accounts/login/
  • HDPenReg

  • MASSICCC



    • The MASSICCC web application offers a simple and dynamic interface for analysing heterogeneous data with a web browser. Various software packages for statistical analysis are available (Mixmod, MixtComp, BlockCluster) which allow for supervised and supervised classification of large data sets.


    • https://massiccc.lille.inria.fr
  • MetaMA



    • MetaMA is a specialised software for microarrays. It is an R package which combines either p-values or modified effect sizes from different studies to find differentially expressed genes. The main competitor of metaMA is geneMeta. Compared to geneMeta, metaMA offers an improvement for small sample size datasets since the corresponding modelling is based on shrinkage approaches.


    • https://cran.r-project.org/web/packages/metaMA/index.html
  • metaRNASeq



    • MetaRNASeq is a specialised software for RNA-seq experiments. It is an R package which is an adaptation of the metaMA package, which performs meta-analysis of microarray data. Both enable to take advantage of empirical bayesian approaches, especially appropriate in a context of high dimension. Specificities of the two types of technologies require however some adaptations to each one, explaining the development of two different packages. To facilitate their use by a large public, a Galaxy-web instance named SMAGEXP has been created and gathers the two packages.


    • https://cran.r-project.org/web/packages/metaRNASeq/index.html
  • MixAll



    • MixAll is a model-based clustering package for modelling mixed data sets. It has been engineered around the idea of easy and quick integration of any kind of mixture models for any kind of data, under the conditional independence assumption. Currently five models (Gaussian mixtures, categorical mixtures, Poisson mixtures, Gamma mixtures and kernel mixtures) are implemented. MixAll has the ability to natively manage completely missing values when assumed as random. MixAll is used as an R package, but its internals are coded in C++ as part of the STK++ library (www.stkpp.org) for faster computation.


    • https://cran.r-project.org/web/packages/MixAll/
  • MixtComp.V4



    • MixtComp (Mixture Computation) is a model-based clustering package for mixed data originating from the Modal team (Inria Lille). It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption. New models will eventually be available from researches, carried out by the Modal team or by other teams. Currently, central architecture of MixtComp is built and functionality has been field-tested through industry partnerships. Five basic models (Gaussian, Multinomial, Poisson, Weibull, NegativeBinomial) are implemented, as well as two advanced models (Functional and Rank). MixtComp has the ability to natively manage missing data (completely or by interval). MixtComp is used as an R package, but its internals are coded in C++ using state of the art libraries for faster computation.


    • https://github.com/modal-inria/MixtComp
  • MixtComp



    • MixtComp (Mixture Computation) is a model-based clustering package for mixed data originating from the Modal team (Inria Lille). It has been engineered around the idea of easy and quick integration of all new univariate models, under the conditional independence assumption. New models will eventually be available from researches, carried out by the Modal team or by other teams. Currently, central architecture of MixtComp is built and functionality has been field-tested through industry partnerships. Three basic models (Gaussian, multinomial, Poisson) are implemented, as well as two advanced models (Ordinal and Rank). MixtComp has the ability to natively manage missing data (completely or by interval). MixtComp is used as an R package, but its internals are coded in C++ using state of the art libraries for faster computation.


    • https://cran.r-project.org/web/packages/RMixtComp/index.html
  • MixCluster



    • MixCluster is an R package for model-based clustering of mixed data (continuous, binary, integer). In this package, the model, submitted for publication in 2014, takes into account the main conditional dependencies between variables through Gaussian copula. Mixcluster performs the model selection and provides the best model according to Bayesian approaches.


    • https://r-forge.r-project.org/R/?group_id=1939
  • MPAGenomics

  • ordinalClust



    • Ordinal data classification, clustering and co-clustering using model-based approach with the Bos distribution for ordinal data



  • PACBayesianNMF

  • pycobra



    • pycobra is a python library for ensemble learning, which serves as a toolkit for regression, classification, and visualisation. It is scikit-learn compatible and fits into the existing scikit-learn ecosystem.

      pycobra offers a python implementation of the COBRA algorithm introduced by Biau et al. (2016) for regression.

      Another algorithm implemented is the EWA (Exponentially Weighted Aggregate) aggregation technique (among several other references, you can check the paper by Dalalyan and Tsybakov (2007).

      Apart from these two regression aggregation algorithms, pycobra implements a version of COBRA for classification. This procedure has been introduced by Mojirsheibani (1999).

      pycobra also offers various visualisation and diagnostic methods built on top of matplotlib which lets the user analyse and compare different regression machines with COBRA. The Visualisation class also lets you use some of the tools (such as Voronoi Tesselations) on other visualisation problems, such as clustering.



    • https://github.com/bhargavvader/pycobra
  • PyRotor



    • PyRotor leverages available trajectory data to focus the search space and to estimate some properties which are then incorporated in the optimisation problem. This constraints in a natural and simple way the optimisation problem whose solution inherits realistic patterns from the data. In particular PyRotor does not require any knowledge on the dynamics of the system.


    • https://pypi.org/project/pyrotor/
  • RankCluster

  • Rmixmod



    • MIXMOD (MIXture MODelling) is an important software for the modal team since it concerns its main
      topics: model-based supervised, unsupervised and semi-supervised classification for various data situations.
      MIXMOD is now a well-distributed software with over 250 downloads/month are recorded for several years.
      MIXMOD is written in C++ (more than 10 000 lines) and distributed under GNU General Public License.
      Several other institutions participate in the MIXMOD development since several years: CNRS, Inria Saclay-




  • rtkore



    • STK++ (http://www.stkpp.org) is a collection of C++ classes for statistics, clustering, linear algebra, arrays (with an Eigen-like API), regression, dimension reduction, etc. The integration of the library to R is using Rcpp. The rtkore package includes the header files from the STK++ core library. All files contain only templated classes or inlined functions. STK++ is licensed under the GNU LGPL version 2 or later. rtkore (the stkpp integration into R) is licensed under the GNU GPL version 2 or later. See file LICENSE.note for details.


    • https://cran.r-project.org/web/packages/rtkore/index.html
  • simerge



    • Allows to perform Co-Clustering on binary (Bernoulli) and counting variables (Poisson) using co-variables.



  • STK++



    • STK++ (Statistical ToolKit in C++) is a versatile, fast, reliable and elegant collection of C++ classes for statistics, clustering, linear algebra, arrays (with an API Eigen-like), regression, dimension reduction, etc. The library is interfaced with lapack for many linear algebra usual methods. Some functionalities provided by the library are available in the R environment using rtkpp and rtkore.

      STK++ is suitable for projects ranging from small one-off projects to complete data mining application suites.



    • http://www.stkpp.org
  • MLGL



    • The MLGL R-package, standing for Multi-Layer Group-Lasso, implements a procedure of variable selection in the context of redundancy between explanatory variables, which holds true with high dimensional data.
      The MLGL approach combines variables aggregation and selection in order to improve interpretability and performance. First, a hierarchical clustering procedure provides at each level a partition of the variables into groups. Then, the set of groups of variables from the different levels of the hierarchy is given as input to group-Lasso, with weights adapted to the structure of the hierarchy. At this step, group-Lasso outputs sets of candidate groups of variables for each value of regularization parameter.
      The versatility offered by MLGL to choose groups at different levels of the hierarchy a priori induces a high computational complexity. MLGL however exploits the structure of the hierarchy and the weights used in group-Lasso to greatly reduce the final time cost.
      The final choice of the regularization parameter – and therefore the final choice of groups – is made by a multiple hierarchical testing procedure.


    • https://cran.r-project.org/web/packages/MLGL/index.html

 

Comments are closed.