Privacy Preserving Smart-Room Analytics

Team: J. Dai, J.Wu, B. Saghafi, J. Konrad, P. Ishwar
Funding: This material is based on work supported by the US National Science Foundation under Smart Lighting ERC Cooperative Agreement No. EEC-0812056
Status: Ongoing (2014-…)

Summary: Although extensive research on action recognition has been carried out using standard video cameras, little work has explored recognition performance at extremely low temporal or spatial camera resolutions. Reliable action recognition in such a “degraded” environment would promote the development of privacy-preserving smart rooms that would facilitate intelligent interaction with its occupants while mitigating privacy concerns. This project aims to explore the trade-off between action recognition performance, number of cameras, and temporal and spatial resolution in a smart-room environment.

A seminar room simulated in Unity3D with 5 ceiling-mounted cameras in a pentagonal arrangement.

As it is impractical to build a physical platform to test every combination of camera positions and resolutions, we use a graphics engine (Unity3D) to simulate a room with various avatars animated using motions captured from real subjects with a Kinect v2 sensor.

Snapshots from various camera viewpoints of an avatar raising his arm.

We study the performance impact of spatial resolutions from a single pixel up to 10×10 pixels (an extremely low spatial resolution), the impact of temporal resolutions from 2 Hz up to 30 Hz and the impact of using up to 5 ceiling cameras.

Extremely low spatial resolution snapshots

Results of this study indicate that reliable action recognition for smart-room centric gestures is possible in environments with extremely low temporal and spatial resolutions. An overview of these results is shown in the table below:

Results
When using 5, single-pixel cameras at 30Hz we can achieve a correct classification rate (CCR) of 75.70% across 9 actions which is only 13.9% lower than the CCR for the same camera set up at 10×10 pixels. In terms of the impact on action recognition performance, we found that spatial resolution has the highest impact, followed by number of cameras, and finally temporal resolution (frame rate).

Additional resources on this project will be added soon.

For a more in-depth explanation of our methodology and the aforementioned results please refer to our paper below.

Publications:

  1. J. Dai, J. Wu, B. Saghafi, J. Konrad, and P. Ishwar, “Towards Privacy-Preserving Activity Recognition Using Extremely Low Temporal and Spatial Resolution Cameras,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshop on Analysis and Modeling of Faces and Gestures (AMFG), June. 2015.