Research – Activity Reports

You can browse the whole of TAO/TAU history through our activity reports (HTML format, PDF available on front pages): 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017 (TAO became TAU), 2018, 2019, 2020, 2021, 2022

Overall objectives

Presentation

Building upon the expertise in machine learning (ML) and stochastic optimization, and statistical physics of the former TAO project-team, the TAU team aims to tackle the vagueness of the Big Data purposes. Based on the claim that (sufficiently) big data can to some extent compensate for the lack of knowledge, Big Data is hoped to fulfill all Artificial Intelligence commitments.

This makes Big Data under-specified in three respects:

  • A first source of under-specification is related to common sense, and the gap between observation and interpretation. The acquired data do not report on “obvious” issues; still, obvious issues are not necessarily so for the computer. Providing the machine with common sense is a many-faceted, AI hard, challenge. A current challenge is to interpret the data and cope with its blind zones (e.g., missing values, contradictory examples, …).
  • A second source of under-specification regards the steering of a Big Data system. Such systems commonly require lifelong learning in order to deal with open environments and users with diverse profiles, expertises and expectations. A Big Data system thus is a dynamic process, whose behavior will depend in a cumulative way upon its future environment. The challenge regards the control of a lifelong learning system.
  • A third source of under-specification regards its social acceptability. There is little doubt that Big Data can pave the way for Big Brother, and ruin the social contract through modeling benefits and costs at the individual level. What are the fair trade-offs between safety, freedom and efficiency ? We do not know the answers. A first practical and scientific challenge is to first assess, and then enforce, the trustworthiness of solutions.

However, several concerns have emerged in the last years regarding Big Data models. First, in industrial context, data is now always big, and many practical problems are relevant to small data. On the opposite, when big data is available, the arms race around LLMs has given birth to increasingly big models, involving hundreds of billions of parameters, and environmental concerns are becoming increasingly high, for their training, but even for their use and the inference process.

Our initial overall under-specification considerations, mitigated with the concerns above, have lead the team to align its research agenda along four pillars:

  • Frugal Learning, addressing the environmental concerns, in terms of deep network architecture and considering the small data regimes;
  • Causal Learning, a grounded way to address the trustworthiness issue by improving explainability of the results;
  • Bidirectional links with Statistical Physics, to better understand very large systems and improve their performances, both in terms of accuracy of the models and energy consumption in their use;
  • Hybridization of Machine Learning with Numerical Simulations, again aiming to reach better efficiency while decreasing the computing needs.

Last but not least, the organization of challenges and the design of benchmarks, a cornerstone of Machine Learning nowadays, remains an active thread of the team activity, in particular through the Codalab platform and its new version Codabench.

Last activity report : 2023

Comments are closed.