Software

  • Codalab

    The TAU group is community lead (under the leadership of Isabelle Guyon) of the open-source Codalab platform, hosted by Université Paris-Saclay, whose goal is to host competitions and benchmarks in machine learning. The project has grown tremendously during the evaluation period, reaching tens of thousands of active users and over 100 competitions organized each month. The engineering team, overseen by Anne-Catherine Letournel (CNRS engineer) includes two engineers dedicated full time to administering the platform and developing challenges: Adrien Pavao, financed by a project started in 2020 with the Région Ile-de-France, et Dinh-Tuan Tran, financed by the ANR AI chaire of Isabelle Guyon. Several other engineers are engaged as contractors on a needs-be basis. The rapid growth in usage led us to put in place a new infrastructure. We have migrated the storage over a distributed Minio (4 physical servers, each with 12 disks of 16 TB) spread over 2 buildings for robustness, and added 10 more GPUs to the existing 10 previous ones in the backend. A lot of horsepower to suport Industry-strength challenges, thanks for the sponsorship of région Ile-de-France, ANR, Université Paris-Saclay, CNRS, INRIA, and ChaLearn.
    • Type of Software: Software as a Vehicle for Research (Open Source).
    • Audience: wide-audience software (aims to be usable by a wide public, to become the reference software in its area, etc.) … need stronger support for that.
    • Maintenance: Long Term Support.
    • Duration of Development: 7 years
  • Cartolabe

    Cartolabe is developped since 2016. Cartolabe has two main usecases: as a website (to explore a map of science/publications/…) or as an API (for scientist to test their NLP algorithms or for data scientist to generate their own maps). After the work of Jonas Renault (2018-2020), a first version was used for many applications and diffusion (as a website): Grand Debat, COVID, debat RUA. After the work of H. Gozukan (2021-2022), Cartolabe is now ready to be used as an API, both for scientists (to test NLP algorithms) and for data scientist (to create new maps). A first hackaton has been organized with the Bibliotheque Nationale de France to demonstrate the possible use of Cartolabe by final users with their own datasets.
    • Type of software: Software as a Vector for Knowledge (as a website).
      Software as a Vehicle for Research (as an open-source API for other researchers).
    • Audience: to be used by people inside and outside the team but without a clear
      and strong dissemination and support action plan; (as an API, dissemination plan
      as started with Hackaton/workshop organized).
      Wide-audience software (aims to be usable by a wide public, to be-
      come the reference software in its area, etc.). As a website, to be used as a science
      map exploration tool.
    • Maintenance: basic maintenance to keep the software alive;
      LTS if we have support (human power)!
    • Duration of the Development: Started in 2016, 5 years of INRIA/CNRS engineer. Half-time ongoing engineer, + researcher part-time development.
  • DNADNA

    DNADNA is a package for deep learning inference in population genetics. DNADNA provides utility functions to improve development of neural networks for population genetics and is currently based on PyTorch.
    In particular, it already implements several neural networks that allow inferring demographic and adaptive history from genetic data. Pre-trained networks can be used directly on real/simulated genetic polymorphism data for prediction. Implemented networks can also be optimized based on user-specified training sets and/or tasks. Finally, any user can implement new architectures and tasks, while benefiting from DNADNA input/output, network optimization, and test environment.
    DNADNA should allow researchers to focus on their research project, be it the analysis of population genetic data or building new methods, without the need to focus on proper development methodology (unit test, continuous integration, documentation, etc.). Results will thus be more easily reproduced and shared. Having a common interface will also decrease the risk of bugs.
    • Type of Software: Software as a Vehicle for Research.
    • Audience: large audience software, usable by people inside and outside the
      field with a clear and strong dissemination, validation, and support action plan;
    • Maintenance: basic maintenance to keep the software alive;
    • Duration of the Development: 3 years

Comments are closed.