MAMMALS

Memory-Augmented Models for low-latency Machine Learning Services (MAMMALS)

A machine learning (ML) model is often trained for inference’s purposes, that is to classify specific inputs (e.g., images) or predict numerical values (e.g., the future position of a vehicle). The ubiquitous deployment of ML in time-critical applications and unpredictable environments poses fundamental challenges to ML inference. Big cloud providers, such as Amazon, Microsoft, and Google, offer their “machine learning as a service” solutions, but running the models in the cloud may fail to meet the tight delay constraints (≤10 ms) of future 5G services, e.g., for connected and autonomous cars, industrial robotics, mobile gaming, augmented and virtual reality. Such requirements can only be met by running ML inference directly at the edge of the network—directly on users’ devices or at nearby servers—without the computing and storage capabilities of the cloud. Privacy and data ownership also call for inference at the edge.

MAMMALS is an Inria exploratory action. It investigates new approaches to run inference under tight delay constraints and with limited resources. In particular, it aims to provide low-latency inferences by running—close to the end user—simple ML models that can also take advantage of a (small) local datastore of examples. The focus is on algorithms to learn online what to store locally to improve inference quality and adapt to the specific context.

Period: 2020-2023.

Follow-ups: MAMMALS research directions are now developed in dAIEDGE (a EU network of excellence for distributed, trustworthy efficient and scalable AI at the edge) and in FedMalin (Inria collaborative project on federated machine learning over the Internet).

Members

Giovanni Neglia, principal investigator
Gabriele Castellano, postdoc funded by Nokia Bell Labs
Francescomaria Faticanti, postdoc
Othmane Marfoq, PhD student
Tareq Si Salem, PhD student

We have a postdoc position to fill. If you are interested send an email to Giovanni Neglia.

Collaborators

at UMass – Amherst, USA: Ramesh Sitaraman, Anirudh Sabnis
at Northeastern University, USA: Stratis Ioannidis Yuanyuan Li,
at University and Polytechnic of Turin, Italy: Michele Garetto and Emilio Leonardi
at University of Verona, Italy: Damiano Carra
at Nokia Bell Labs, France: Fabio Pianese, Tianzhu Zhang
at Accenture Labs, France: Laetitia Kameni, Richard Vidal

Highlights

Emilio Leonardi, Giovanni Neglia, and Thrasyvoulos Spyropoulos gave a tutorial on Similarity Caching at ACM Sigmetrics 2021
Best paper award at ITC-33 for the paper AÇAI: Ascent Similarity Caching with Approximate Indexes

Publications

Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees [editor], [preprint]
Tareq Si Salem, Gabriele Castellano, Giovanni Neglia, Fabio Pianese, and Andrea Araldo, IEEE/ACM Transaction on Networking, online, August 31, 2023
No-Regret Caching via Online Mirror Descent [editor], [preprint]
Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis, ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), Volume 8, Issue 4, Article No. 11, August 11 2023
Optimistic Online Caching for Batched Requests
Francescomaria Faticanti, Giovanni Neglia, IEEE International Conference on Communications (ICC), Rome, Italy, 28 May-1 June, 2023
Enabling Long-term Fairness in Dynamic Resource Allocation [preprint]
Tareq Si Salem, George Iosifidis, Giovanni Neglia, ACM SIGMETRICS 2023, Orlando, Florida, USA, June 19-23, 2023
Computing the Hit Rate of Similarity Caching [preprint]
Younes Ben Mazziane, Sara Alouf, Giovanni Neglia, Daniel Sadoc Menasche, IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, December 4-8, 2022
Ascent Similarity Caching with Approximate Indexes [editor], [extended]
Tareq Si Salem, Giovanni Neglia, Damiano Carra, IEEE/ACM Transactions on Networking, online, November, 2022
GRADES: Gradient Descent for Similarity Caching [editor]
Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman, IEEE/ACM Transactions on Networking, online, November, 2022
Regularized Bottleneck with Early Labeling [preprint]
Gabriele Castellano, Fabio Pianese, Damiano Carra, Tianzhu Zhang, Giovanni Neglia, ITC 2022 – 34th International Teletraffic Congress, Shenzhen, China, September 14-16, 2022
Personalized Federated Learning through Local Memorization [preprint]
Othmane Marfoq, Laetitia Kameni, Richard Vidal, Giovanni Neglia, International Conference on Machine Learning (ICML), July, 2022
Online Caching Networks with Adversarial Guarantees [editor], [preprint]
Yuanyuan Li, Tareq Si Salem, Giovanni Neglia, Stratis Ioannidis, ACM SIGMETRICS / IFIP PERFORMANCE 2022, Mumbai, India June 6-10, 2022
Content Placement in Networks of Similarity Caches [editor], [preprint]
Michele Garetto, Emilio Leonardi, Giovanni Neglia, Elsevier Computer Networks, online, November 2021
Similarity Caching: Theory and Algorithms [editor], [preprint]
Giovanni Neglia, Michele Garetto, and Emilio Leonardi, IEEE/ACM Transactions on Networking, online, December 2021
Taking two Birds with one k-NN Cache [preprint]
Damiano Carra and Giovanni Neglia, Proc. of The 2021 IEEE Global Communications Conference (Globecom 2021), Madrid, Spain, December 7-11, 2021
AÇAI: Ascent Similarity Caching with Approximate Indexes [preprint], [extended]
Tareq Si Salem, Giovanni Neglia, Damiano Carra, Proc. of the 33rd International Teletraffic Congress (ITC-33), online conference, August 31st-September 3rd, 2021. BEST PAPER AWARD
Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees [preprint]
Tareq Si Salem, Gabriele Castellano, Giovanni Neglia, Fabio Pianese, and Andrea Araldo,
Proc. of the 19th Mediterranean Communication and Computer Networking Conference (MedComNet 2021), online conference, June 15-17, 2021
GRADES: Gradient Descent for Similarity Caching [preprint]
Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman,
Prof. of the IEEE International Conference on Computer Communications (INFOCOM 2021), online conference, May 10-13, 2021
No-Regret Caching via Online Mirror Descent [preprint]
Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis, IEEE International Conference on Communications (ICC). Montreal, Canada, online conference, June 2021.
Similarity Caching: Theory and Algorithms [preprint]
Michele Garetto, Emilio Leonardi, and Giovanni Neglia,
Proc. of the IEEE International Conference on Computer Communications (INFOCOM 2020), Toronto, Canada, July, 2020
A longer version is available on arXiv

Memory-Augmented Models for low-latency Machine Learning Services (MAMMALS)

Members

Collaborators

Highlights

Publications

Mammals’ Similarity Caching Traces Repository