Recognition and Localization of Food in Cooking Videos

Examples of food-state localization at key-frames of 50 salads dataset.


In this paper, we describe experiments with techniques for locating foods and recognizing food states in cooking videos.
We describe production of a new data set that provides annotated images for food types and food states. We compare results with two techniques for detecting food types and food states, and then show that recognizing type and state with separate classifiers improves recognition results. We then use this to provide detection of composite activation maps for food types. The results provide a promising first step towards construction of narratives for cooking actions.




Architecture: Food concept maps.




Food objects segmentation of some keyframes of 50 salads dataset are available here.

Evaluation Dataset: 50 salads dataset keyframes Total=340 keyframe instant food segmentation. Download here (img_n_annotations_v2.tar).



CEA18@ECCV18 [paper], [slides]

  • Nachwa Aboubakr
  • Remi Ronfard
  • James Crowley


author = {Aboubakr, Nachwa and Ronfard, Remi and Crowley, James},
title = {Recognition and Localization of Food in Cooking Videos},
booktitle = {Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management},
series = {CEA/MADiMa '18},
year = {2018},
isbn = {978-1-4503-6537-6},
location = {Stockholm, Sweden},
pages = {21--24},
numpages = {4},
url = {},
doi = {10.1145/3230519.3230590},
acmid = {3230590},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {object localization, weakly supervised learning},

Comments are closed.