Automated behavior recognition in humans

Observation of human behavior is a key ingredient of psychology, ergonomics, movement science and consumer studies. However, manual video annotation is a laborious and error-prone process. Therefore, tools are desired to automatically detect, track and interpret human body movements. We employ computer vision, signal processing and machine learning techniques to develop novel tools for automatic detection of specific behavioral states in humans. Current projects focus on video-based person tracking, pose estimation, gesture tracking and recognition of eating-related behaviors. The research within this theme focuses on the development of tools that capture human motion without affecting the behavior of the subject(s), but that can cope with the major challenges in computer vision, such as illumination changes and the occurrence of occlusions.

Automated behavior recognition in humans
Observation of human behavior is key in many studies. Automating the recognition of certain behaviors makes the work less laborious and less error-prone. 
Automated behavior recognition in humans
Current NIW projects focus on video-based person tracking, pose estimation, and more.

The development of solutions for automated behavior recognition will have a great impact on the objectivity, quality and efficiency of human behavior research. Further research and development of our tools will lead to generic building blocks for solutions in various research contexts and real-life settings.

Tracking the ground floor position of multiple persons

The main challenge of tracking multiple persons is the handling of occlusion. Even when the scene is empty, a person might be partly visible because another person is in front of him at a certain time. Our proposed solution uses multiple calibrated cameras and appearance models of each subject. The subjects are first tracked in the 2D camera views. Only those cameras that see the subject best are used to determine the subject’s position on the ground floor. This research has resulted in our prototype PeopleTracker, which can track up to 4 persons using the images of 4 cameras. 

Automated behavior recognition in humans
Tracking multiple persons holds the challenge of occlusion.
Automated behavior recognition in humans
Using multiple calibrated cameras and appearance models can be the solution.

Contour tracking

The silhouette of a person provides valuable information such as the subject’s appearance and shape. These cues are essential in pose estimation. Background subtraction is a standard way to seifrigment a person from the background, but it requires static cameras and is sensitive to illumination changes and shadows. We follow a more advanced approach based on pixel-wise posteriors [1]. Once an initial segmentation is available, we track the segmentation over time using a levelset-based contour tracker. Adjusting the segmentation using a so-called levelset function is time-consuming, but it removes the need of a static background or a static camera. Therefore, the segmentation is restricted to a small area inside a bounding box defined such that the subject (or a specific body part) is inside. This bounding box is tracked as a rigid object, which is fast, and the segmentation is carried out later. 

[1] C. Bibby and I. Reid, Robust Real-Time Visual Tracking using Pixel-Wise Posteriors, In Proceedings of the 10th European Conference on Computer Vision, Marseille, France, 2008.

Multimodal activity recognition

Behavior of people consists of a series of actions. To automate behavior measurements, actions must be detected. We investigate discriminative models, which do not use any knowledge about the human body. This research is carried out within the project iCareNet with a focus on eating behavior. As a typical example, we observe a person who is eating or drinking, where we try to identify when he/she takes a bite or drinks from a glass. To improve the performance, additional sensors besides cameras are allowed as long as the subject’s behavior is not affected.


  • Aa, N.P. van der, Noldus, L.P.J.J. & Veltkamp, R.C. (2012). Video-Based Multi-person Human Motion Capturing. In Proceedings of Measuring Behavior 2012, 8th International Conference on Methods and Techniques in Behavioral Research (Utrecht, The Netherlands, August 28-31, 2012) (pp. 75-78).
  • Resodikromo, J. (2012). Markerless 3D Pose Estimation. M.Sc. thesis, Utrecht University.
  • Aa, N.P. van der; Luo, X.; Giezeman, G.J.; Tan, R.T.; Veltkamp, R.C. (2011). Utrecht Multi-Person Motion (UMPM) benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011.
  • Evers, F. (2012). 2D pose estimation in the Restaurant of the Future. M.Sc. thesis, Utrecht University.

Links to projects

This research is carried out in the framework of the following collaborative projects: 

  • GATE: Game Research for Training and Education
  • iCareNet: Marie Curie Initial Training Network on Healthcare, Wellness and Assisted Living
  • WPSS: Watching People Security Services