Human Pose Estimation

Spatio-temporal Matching for Human Pose Estimation

Detection and tracking of humans in three videos using STM.

STM extracts trajectories in video (gray lines) and selects a subset of trajectories (a) that match with the 3D motion capture model (b) learned from the CMU motion capture data set.

People

Introduction

Detection and tracking humans in videos have been long-standing problems in computer vision. Most successful approaches (e.g., deformable parts models) heavily rely on discriminative models to build appearance detectors for body joints and generative models to constrain possible body configurations. While these 2D models have been successfully applied to images (and with less success to videos), a major challenge is to generalize these models to cope with camera views. In order to achieve view-invariance, these 2D models typically require a large amount of training data across views that is difficult to gather and time-consuming to label. Unlike existing 2D models, this paper formulates the problem of human detection in videos as spatio-temporal matching (STM) between a 3D motion capture model and trajectories in videos. Our algorithm estimates the camera view and selects a subset of tracked trajectories that matches the motion of the 3D model. The STM is efficiently solved with linear programming, and it is robust to tracking mismatches, occlusions and outliers.

Videos

This spotlight summarizes the main problem and our contributions.

Download the [Video 40MB].

This video shows the details of computing STM on three datasets, CMU Motion Capture Database, Berkeley MHAD Database, and Human3.6M Database.

Download the [Video 51MB].

The associated video for the first sequence shown in Fig.8a of [2].

Download the [Video 6MB].

The associated video for the second sequence shown in Fig.8a of [2].

Download the [Video 7MB].

The associated video for the third sequence shown in Fig.8a of [2].

Download the [Video 22MB].

The associated video for the first sequence shown in Fig.11a of [2].

Download the [Video 128MB].

The associated video for the second sequence shown in Fig.11a of [2].

Download the [Video 149MB].

Publications

[1]

Spatio-temporal Matching for Human Detection in Video
European Conference on Computer Vision (ECCV), 2014
F. Zhou and F. De la Torre

[Paper 20MB] [Slides 96MB]
[2]

Spatio-temporal Matching for Human Pose Esimation in Video
IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 38(8):1492-1504, 2016
F. Zhou and F. De la Torre

[Paper 25MB]