Memory-Augmented Models for low-latency Machine Learning Services (MAMMALS)
A machine learning (ML) model is often trained for inference’s purposes, that is to classify specific inputs (e.g., images) or predict numerical values (e.g., the future position of a vehicle). The ubiquitous deployment of ML in time-critical applications and unpredictable environments poses fundamental challenges to ML inference. Big cloud providers, such as Amazon, Microsoft, and Google, offer their “machine learning as a service” solutions, but running the models in the cloud may fail to meet the tight delay constraints (≤10 ms) of future 5G services, e.g., for connected and autonomous cars, industrial robotics, mobile gaming, augmented and virtual reality. Such requirements can only be met by running ML inference directly at the edge of the network—directly on users’ devices or at nearby servers—without the computing and storage capabilities of the cloud. Privacy and data ownership also call for inference at the edge.
MAMMALS is an Inria exploratory action. It investigates new approaches to run inference under tight delay constraints and with limited resources. In particular, it aims to provide low-latency inferences by running—close to the end user—simple ML models that can also take advantage of a (small) local datastore of examples. The focus is on algorithms to learn online what to store locally to improve inference quality and adapt to the specific context.
Period: 2020-2023.
Follow-ups: MAMMALS research directions are now developed in dAIEDGE (a EU network of excellence for distributed, trustworthy efficient and scalable AI at the edge) and in FedMalin (Inria collaborative project on federated machine learning over the Internet).
Members
- Giovanni Neglia, principal investigator
- Gabriele Castellano, postdoc funded by Nokia Bell Labs
- Francescomaria Faticanti, postdoc
- Othmane Marfoq, PhD student
- Tareq Si Salem, PhD student
We have a postdoc position to fill. If you are interested send an email to Giovanni Neglia.
Collaborators
- at UMass – Amherst, USA: Ramesh Sitaraman, Anirudh Sabnis
- at Northeastern University, USA: Stratis Ioannidis Yuanyuan Li,
- at University and Polytechnic of Turin, Italy: Michele Garetto and Emilio Leonardi
- at University of Verona, Italy: Damiano Carra
- at Nokia Bell Labs, France: Fabio Pianese, Tianzhu Zhang
- at Accenture Labs, France: Laetitia Kameni, Richard Vidal
Highlights
- Emilio Leonardi, Giovanni Neglia, and Thrasyvoulos Spyropoulos gave a tutorial on Similarity Caching at ACM Sigmetrics 2021
- Best paper award at ITC-33 for the paper AÇAI: Ascent Similarity Caching with Approximate Indexes
Publications
-
Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees [editor], [preprint]
Tareq Si Salem, Gabriele Castellano, Giovanni Neglia, Fabio Pianese, and Andrea Araldo, IEEE/ACM Transaction on Networking, online, August 31, 2023 -
No-Regret Caching via Online Mirror Descent [editor], [preprint]
Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis, ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), Volume 8, Issue 4, Article No. 11, August 11 2023 -
Optimistic Online Caching for Batched Requests
Francescomaria Faticanti, Giovanni Neglia, IEEE International Conference on Communications (ICC), Rome, Italy, 28 May-1 June, 2023 -
Enabling Long-term Fairness in Dynamic Resource Allocation [preprint]
Tareq Si Salem, George Iosifidis, Giovanni Neglia, ACM SIGMETRICS 2023, Orlando, Florida, USA, June 19-23, 2023 -
Computing the Hit Rate of Similarity Caching [preprint]
Younes Ben Mazziane, Sara Alouf, Giovanni Neglia, Daniel Sadoc Menasche, IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, December 4-8, 2022 -
Ascent Similarity Caching with Approximate Indexes [editor], [extended]
Tareq Si Salem, Giovanni Neglia, Damiano Carra, IEEE/ACM Transactions on Networking, online, November, 2022 -
GRADES: Gradient Descent for Similarity Caching [editor]
Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman, IEEE/ACM Transactions on Networking, online, November, 2022 -
Regularized Bottleneck with Early Labeling [preprint]
Gabriele Castellano, Fabio Pianese, Damiano Carra, Tianzhu Zhang, Giovanni Neglia, ITC 2022 – 34th International Teletraffic Congress, Shenzhen, China, September 14-16, 2022 -
Personalized Federated Learning through Local Memorization [preprint]
Othmane Marfoq, Laetitia Kameni, Richard Vidal, Giovanni Neglia, International Conference on Machine Learning (ICML), July, 2022 -
Online Caching Networks with Adversarial Guarantees [editor], [preprint]
Yuanyuan Li, Tareq Si Salem, Giovanni Neglia, Stratis Ioannidis, ACM SIGMETRICS / IFIP PERFORMANCE 2022, Mumbai, India June 6-10, 2022 - Content Placement in Networks of Similarity Caches [editor], [preprint]
Michele Garetto, Emilio Leonardi, Giovanni Neglia, Elsevier Computer Networks, online, November 2021 - Similarity Caching: Theory and Algorithms [editor], [preprint]
Giovanni Neglia, Michele Garetto, and Emilio Leonardi, IEEE/ACM Transactions on Networking, online, December 2021 - Taking two Birds with one k-NN Cache [preprint]
Damiano Carra and Giovanni Neglia, Proc. of The 2021 IEEE Global Communications Conference (Globecom 2021), Madrid, Spain, December 7-11, 2021 - AÇAI: Ascent Similarity Caching with Approximate Indexes [preprint], [extended]
Tareq Si Salem, Giovanni Neglia, Damiano Carra, Proc. of the 33rd International Teletraffic Congress (ITC-33), online conference, August 31st-September 3rd, 2021. BEST PAPER AWARD - Towards Inference Delivery Networks: Distributing Machine Learning with Optimality Guarantees [preprint]
Tareq Si Salem, Gabriele Castellano, Giovanni Neglia, Fabio Pianese, and Andrea Araldo,
Proc. of the 19th Mediterranean Communication and Computer Networking Conference (MedComNet 2021), online conference, June 15-17, 2021 - GRADES: Gradient Descent for Similarity Caching [preprint]
Anirudh Sabnis, Tareq Si Salem, Giovanni Neglia, Michele Garetto, Emilio Leonardi, and Ramesh K. Sitaraman,
Prof. of the IEEE International Conference on Computer Communications (INFOCOM 2021), online conference, May 10-13, 2021 - No-Regret Caching via Online Mirror Descent [preprint]
Tareq Si Salem, Giovanni Neglia, and Stratis Ioannidis, IEEE International Conference on Communications (ICC). Montreal, Canada, online conference, June 2021. - Similarity Caching: Theory and Algorithms [preprint]
Michele Garetto, Emilio Leonardi, and Giovanni Neglia,
Proc. of the IEEE International Conference on Computer Communications (INFOCOM 2020), Toronto, Canada, July, 2020
A longer version is available on arXiv