Research Spotlight Archive

Back to Research Spotlight

Title: Coastal Video Surveillance

Participants: Daniel Cullen (MS ’12), Professors Janusz Konrad (ECE), and Thomas Little (ECE)

Funding: National Science Foundation, MIT SeaGrant “Consortium for Ocean Sensing of the Nearshore Environment”

Status: Ongoing (2011-Present)

Background: The monitoring of coastal environments is of great interest to biologists, ecologists, environmentalists, and law enforcement officials. For example, marine biologists would like to know if humans have come too close to seals on a beach and law enforcement officials would like to know how many people and cars have been on the beach and if they have disturbed the fragile sand dunes. Because there are large areas to monitor and a wide range of goals, an obvious sensing modality is a video camera. However, with 100+ hours of video recorded by each camera per week, a search for salient events by human operators is not sustainable. Furthermore, automated video analysis of maritime scenes is very challenging due to background activity (e.g., water reflections and waves) and a very large field of view.

Case study: The beach on Great Point, Nantucket, Massachusetts

Summary: The goal of this research is to develop an approach to analyze the video data and to distill hours of video down to a few short segments containing only the salient events, allowing human operators to expeditiously study a coastal scene. We propose a practical approach to the detection of three salient events, namely boats, motor vehicles and people appearing close to the shoreline, and their subsequent summarization. This choice of objects of interest is dictated by our application, but our approach is general and can be applied in other scenarios as well. As illustrated in the diagram, our approach consists of three main steps: object detection, object classification, and video summarization. First, the object detection block performs background subtraction to identify regions of interest, followed by behavior subtraction to reduce statistically-stationary motion (e.g., ocean waves), and then connected-components analysis to identify bounding rectangles around the regions of interest. Next, covariance matrix-based object classification is applied to classify each region of interest as a car, a boat, a person, or none of the above. Finally, video condensation by ribbon carving generates video summaries of each salient object, using the classified regions of interest for the input cost data. Our system is efficient and robust, as shown in the results below.

Block diagram of the proposed coastal surveillance system

Results: We tested the effectiveness of our approach on long videos taken at Great Point, Nantucket, Massachusetts. Shown below are sample frames that illustrate the output of each processing step. The two columns show results from two different video sequences.

processing_steps

Output of subsequent processing steps

 

A few more examples of the object classification step are shown below. Blue identifies detections of boats, red identifies cars, and green identifies people.

Results of salient event detection and classification

Results of salient event detection and classification

The amount of summarization that we can achieve varies greatly with the amount of activity in the scene. However, even for frames with high activity, we achieved almost a 20x reduction in frame count. The table below gives summarization results for one video sequence.

Results for video containing boats and people.
Input: 38 minutes long at 5 fps, 640×360 resolution.
Cost function for
video condensation
Number of frames after each step Condensation
ratio (flex 3)
input flex 0 flex 1 flex 2 flex 3
Boats only 11379 1752 928 723 600 18.97:1
People only 11379 3461 2368 1746 1285 8.85:1
Boats or people 11379 4908 3253 2504 1897 5.99:1
Behavior subtraction 11379 11001 8609 8147 7734 1.47:1

We designed our approach with computational efficiency in mind. The table below shows execution time benchmark results. As we can see, video condensation is by far the most time-consuming step.

Processing Step Average Execution Time
Background Subtraction 0.292 sec/frame
Behavior Subtraction 0.068 sec/frame
Object Detection 0.0258 sec/frame
Video Condensation flex 0 0.034 sec/frame
Video Condensation flex 1 2.183 sec/frame
Video Condensation flex 2 1.1229 sec/frame
Video Condensation flex 3 0.994 sec/frame
Total for all steps: 5.058 sec/frame


Publications:

  • D. Cullen, J. Konrad, and T. Little, “Detection and Summarization of Salient Events in Coastal Environments,” in Proceedings of the IEEE International Conference on Advanced Video and Signal-Based Surveillance, September 2012.
  • D. Cullen, “Detecting and Summarizing Salient Events in Coastal Videos,” Tech. Rep. 2012-06 (Master’s project), Boston University, Department of Electrical & Computer Engineering, May 2012.

Pages: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36