The Companion Robot NAO

 Description | Videos | Papers | Links



The first NAO robot is equipped with a stereoscopic camera pair with VGA resolution and with four microphones. This robot was used in the EU project HUMAVIPS (2010-2013)


This version of NAO has a stereoscopic camera pair as well as twelve microphones forming a spherical array. This robot was used in the EU project EARS (2014-2017)

NAO version V with a stereoscopic camera pair and four microphones. Robot progamming is performed using NAOLab. This robot is used in the VHIA project (2014-2019)


The PERCEPTION team selected the companion robot NAO for experimenting and demonstrating various audio-visual skills as well as for developing the concept of a social robot that is able to recognize human presence, to understand people’s gestures and speech, and to communicate by synthesizing appropriate behavior. The main challenge of our team is to enable human-robot interaction in the real world.

The humanoid robot NAO is manufactured by Aldebaran Robotics, now SoftBank. Standing, the robot is roughly 60 cm tall, and 35cm when it is sitting. Approximately 30 cm large, NAO includes two CPUs. The first one, placed in the torso, together with the batteries, controls the motors and hence provides kinematic motions with 26 degrees of freedom. The other CPU is placed in the head and is in charge of managing the proprioceptive sensing, the communications, and the audio-visual sensors (two cameras and four microphones, in our case). NAO’s on-board  computing resources can be accessed either via wired or wireless communication protocols.

NAO’s commercially available head is equipped with two cameras that are arranged along a vertical axis: these cameras are neither synchronized nor a significant common field of view. Hence, they cannot be used in combination with stereo vision. Within the EU project HUMAVIPS, Aldebaran Robotics developed a binocular camera system that is arranged horizontally. It is therefore possible to implement stereo vision algorithms on NAO. In particular, one can take advantage of both the robot’s cameras and microphones. The cameras deliver VGA sequences of image pairs at 12 FPS, while the sound card delivers the audio signals arriving from all four microphones and sampled at 48 kHz. Subsequently, Aldebaran developed a second binocular camera system to go into the head of NAO v5.

In order to manage the information flow gathered by all these sensors, we implemented our software on top of the Robotics Services Bus (RSB). RSB is a platform-independent event-driven middleware specifically designed for the needs of distributed robotic applications. Several RSB tools are available, including real-time software execution, as well as tools to record the event/data flow and to replay it later, so that application development can be done off-line. RSB events are automatically equipped with several time stamps for introspection and synchronization purposes. RSB was chosen because it allows our software to be run on a remote PC platform, neither with performance nor deployment restrictions imposed by the robot’s CPU’s. Moreover, the software packages can be easily reused for other robots.

More recently (2015-2016) the PERCEPTION team started the development of NAOLab, a middleware for hosting robotic applications in C, C++, Python and Matlab, using the computing power available with NAO, augmented with a networked PC.


3D reconstruction with NAO’s stereoscopic camera pair at 10 frames/second:

Audiovisual heading. NAO turns its head towards a detected face that emits a sound:


Jan Cech, Ravi Mittal, Antoine Deleforge, Jordi Sanchez-Riera, Xavier Alameda-Pineda and Radu Horaud. Active-Speaker Detection and Localization with Microphones and Cameras Embedded into a Robotic HeadIEEE International Conference on Humanoid Robots (HUMANOIDS’13), Oct 2013, Atlanta, USA.

main_final.pdf BibTex

Jordi Sanchez-Riera, Xavier Alameda-Pineda, Johannes Wienke; Antoine Deleforge, Soraya Arias, Jan Cech, Sebastian Wrede and Radu Horaud. Online Multimodal Speaker Detection for Humanoid Robots.  IEEE International Conference on Humanoid Robotics (HUMANOIDS’12), Nov 2012, Osaka, Japan.


Maxime Janvier; Xavier Alameda-Pineda, Laurent Girin and Radu Horaud. Sound-Event Recognition with a Companion HumanoidIEEE International Conference on Humanoid Robotics (HUMANOIDS’12), Nov 2012, Osaka, Japan.

IROS.pdf BibTex

Fabien Badeig, Quentin Pelorson, Soraya Arias, Vincent Drouard, Israel Dejene Gebru, Xiaofei Li, Georgios Evangelidis, Radu Horaud. A Distributed Architecture for Interacting with NAO. International Conference on Multimodal Interaction, Nov 2015, Seattle, WA, United States. 2015, BibTex

Xiaofei Li, Laurent Girin, Fabien Badeig, Radu Horaud. Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function. IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2016, Daejeon, South Korea. pp.2819-2826, 2016. BibTex



One Comment

  1. Pingback: The NAR dataset » Perception

Comments are closed