Action4D: Real-time Action Recognition in the Crowd and Clutter

Quanzeng You and Hao Jiang

Microsoft, USA

Recognizing every person's action in a crowded and cluttered environment is a challenging task. In this paper, we propose a real-time action recognition method, Action4D, which gives reliable and accurate results in the real-world settings. We propose to tackle the action recognition problem using a holistic 4D "scan" of a cluttered scene to include every detail about the people and environment. Our real-time 4D scan generation method gives solid modeling of the scene. Recognizing multiple people's actions in the cluttered 4D representation is a new problem. In this paper, we propose novel methods to solve this problem. We propose a new method to track people in the cluttered 4D volume, which can reliably detect and follow each person in real time. We propose a new deep neural network, the Action4D-Net, to recognize the action of each tracked person. The Action4D-Net's novel structure uses both the global feature and the focused attention to achieve state-of-the-art result. Our real-time method is invariant to camera view angles, resistant to clutter and able to handle crowd. The experimental results show that the proposed method is fast, reliable and accurate. Our method paves the way to action recognition in real-world applications and is ready to be deployed to enable smart homes, smart factories and smart stores.

Videos

Tracking

Action Ground Truth One (a)

Action Ground Truth One (b)

Action Ground Truth One (c)

Action Ground Truth Two (a)

Action Ground Truth Two (b)

Action Ground Truth Two (c)

Action Ground Truth Two (d)