Since the Sept. 11, 2001 attacks on the World Trade Center and Pentagon, video camera networks have proliferated in the U.S. and abroad, appearing everywhere from airports to border crossings to city streets. Today more than 30 million surveillance cameras produce nearly 4 billion hours of video footage each week, but the reams of data they produce exceed the processing capacity of human analysts. Even where software is used to sift the data for suspicious activity, the algorithms used are not always up to the task, especially in busy urban areas.
Recognizing these mounting challenges, two ECE researchers — Professor Janusz Konrad and Associate Professor Venkatesh Saligrama —and Pierre-Marc Jodoin, an assistant professor of computer science at the University of Sherbrooke in Canada, have devised a new automated method to process video data and pinpoint potential security risks that’s much faster and more reliable than conventional techniques. They report on their research in the lead article in the September 2010 issue of IEEE Signal Processing Magazine.
The article advances a new statistical approach for detecting unusual objects or events, such as abandoned packages or illegal vehicle maneuvers, in the most highly cluttered urban environments. Rather than classify and track objects in a video stream, as most video surveillance software does, this approach breaks footage down to a sequence of snapshots, compares pixels in subsequent snapshots for subtle changes, and uses statistical methods to identify and locate pixel-level changes that depart from normal activity within the monitored scene. Data collected on these anomalies could then be tracked via conventional software systems.
“Typical approaches entail tagging, identifying and tracking every single object, but in an urban setting with too many moving objects, you can’t track them all,” said Saligrama. “Instead of tagging and tracking objects, our idea is to collect pixel-level statistics and monitor variations over time. Using cameras with embedded algorithms, we’ve shown that pixel-level anomaly detection can work.”
The method works by characterizing activity at each pixel within a video frame as either moving—represented by a “1”—or still—depicted by a “0.” Over time, a sequence of consecutive 1s in a set of adjacent pixels signifies a busy period; a sequence of 0s denotes an idle period. Conventional machine-learning techniques can then be applied to this binary data to establish a baseline of typical events within a given space, and thus enable the software to flag those events that depart from the baseline.
The method’s speed, accuracy and minimal computer memory requirements (algorithms that process the data can be deployed in surveillance cameras rather than run on centralized servers) have garnered favorable attention in industry circles. A paper on the method that the research team published in the April edition of SPIE Professional was cited as the issue’s top downloaded article, and potential industry partners have initiated conversations with the researchers about possible collaborations. Saligrama and his colleagues recently applied for a patent through Boston University.
Drawing on funding from the National Science Foundation, Department of Homeland Security, National Geospatial-Intelligence Agency, and Office of Naval Research, the research team next plans to refine its method by considering different time scales.
“While an event may be considered anomalous on a shorter time scale, it may be quite normal on a longer scale, and vice versa,” observed Konrad. “For example, the congestion on the Massachusetts Turnpike under the Photonics Building at 8:33 a.m. on a weekday is likely to be considered normal when compared to traffic at 8:35 a.m. or 8:45 a.m. However, most likely it would be considered anomalous when compared to traffic at 10:33 a.m. We are currently developing new multi-scale models and classification methods to address this.”
As the researchers improve their method to account for such issues, they have no expectation that it will completely eliminate the need for the human eye.
“I don’t envision removing the human out of surveillance,” said Saligrama, “but reducing the amount of human attention that’s needed.”