Memory-Augmented Models for low-latency Machine Learning Services (MAMMALS)

A machine learning (ML) model is often trained for inference’s purposes, that is to classify specific inputs (e.g., images) or predict numerical values (e.g., the future position of a vehicle). The ubiquitous deployment of ML in time-critical applications and unpredictable environments poses fundamental challenges to ML inference. Big cloud providers, such as Amazon, Microsoft, and Google, offer their “machine learning as a service” solutions, but running the models in the cloud may fail to meet the tight delay constraints (≤10 ms) of future 5G services, e.g., for connected and autonomous cars, industrial robotics, mobile gaming, augmented and virtual reality. Such requirements can only be met by running ML inference directly at the edge of the network—directly on users’ devices or at nearby servers—without the computing and storage capabilities of the cloud. Privacy and data ownership also call for inference at the edge.

MAMMALS is an Inria exploratory action. It investigates new approaches to run inference under tight delay constraints and with limited resources. In particular, it aims to provide low-latency inferences by running—close to the end user—simple ML models that can also take advantage of a (small) local datastore of examples. The focus is on algorithms to learn online what to store locally to improve inference quality and adapt to the specific context.


We have a postdoc position to fill. If you are interested send an email to Giovanni Neglia.



  • Taking two Birds with one k-NN Cache [preprint]
    D. Carra and G. Neglia, Proc. of The 2021 IEEE Global Communications Conference (Globecom 2021), Madrid, Spain, December 7-11, 2021
  • AÇAI: Ascent Similarity Caching with Approximate Indexes [preprint], [extended]
    Tareq Si Salem, Giovanni Neglia, Damiano Carra, Proc. of the 33rd International Teletraffic Congress (ITC-33), online conference, August 31st-September 3rd, 2021. BEST PAPER AWARD
  • Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees [preprint]
    Tareq Si Salem, Gabriele Castellano, Giovanni Neglia, Fabio Pianese, and Andrea Araldo,
    Proc. of the 19th Mediterranean Communication and Computer Networking Conference (MedComNet 2021), online conference, June 15-17, 2021
  • GRADES: Gradient Descent for Similarity Caching [preprint]
    Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman,
    Prof. of the IEEE International Conference on Computer Communications (INFOCOM 2021), online conference, May 10-13, 2021
  • No-Regret Caching via Online Mirror Descent [preprint]
    Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis, IEEE International Conference on Communications (ICC). Montreal, Canada, online conference, June 2021.
  • Similarity Caching: Theory and Algorithms [preprint]
    Michele Garetto, Emilio Leonardi, and Giovanni Neglia,
    Proc. of the IEEE International Conference on Computer Communications (INFOCOM 2020), Toronto, Canada, July, 2020
    A longer version is available on arXiv

Comments are closed.