IVC Software & Data

Anonymous FTP-able Software

Important Note:
Permission to use, copy, or modify this software and its documentation for educational and research purposes only and without fee is hereby granted, provided that the copyright notices appear on all copies and supporting documentation. For any other uses of this software, in original or modified form, including but not limited to distribution in whole or in part, specific prior permission must be obtained from Boston University. These programs shall not be used, rewritten, or adapted as the basis of a commercial software or hardware product without first obtaining appropriate licenses from Boston University. Boston University and the authors make no representations about the suitability of this software for any purpose. It is provided “as is” without express or implied warranty.

Unconstrained Salient Object Detection

We aim to do bounding box localization for salient objects in unconstrained images. We propose a system that can output a highly reduced set of detection windows based on a CNN proposal generation model and a novel proposal subset optimization formulation. Our system significantly outperforms existing methods in localizing dominant objects.

Minimum Barrier Salient Object Detection

We propose a highly efficient, yet powerful, salient object detection method based on a fast Minimum Barrier Distance Transform algorithm. Our salient object detection method (MB) achieves state-of-the-art performance and runs at about 80 FPS using a single thread. Furthermore, a technique based on color whitening is proposed to extend our method to leverage the appearance-based backgroundness cue. This extended version (MB+) further improves the performance, while still runs at about 50 FPS.

Salient Object Subitizing

People can immediately and precisely identify that an image contains 1, 2, 3 or 4 items by a simple glance. The phenomenon, known as Subitizing, inspires us to pursue the task of Salient Object Subitizing, i.e. predicting the existence and the number of salient objects in a scene using holistic cues. We have collected two benchmark image datasets for salient object subitizing, and include a baseline algorithm as described in: Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiaohui Shen, Brian Price and Radomir Mech, “Salient Object Subitizing”. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.pdf

MEEM: Robust Tracking via Multiple Experts using Entropy Minimization

Software for single object model-free online tracking, where the base tracker and its historical snapshots constitute an expert ensemble, and the best expert is selected to restore the current tracker when needed based on a minimum entropy criterion. Described in: Jianming Zhang, Shugao Ma and Stan Sclaroff. “MEEM: Robust Tracking via Multiple Experts using Entropy Minimization.“, In Proc. European Conf. on Computer Vision (ECCV), September 2014.

Action Recognition and Localization by Hierarchical Space-Time Segments

Software for extracting hierarchical space-time segments from videos, which can be used for action recognition and localization. Described in:

Shugao Ma, Jianming Zhang, Nazli Ikizler-Cinbis and Stan Sclaroff. Action Recognition and Localization by Hierarchical Space-Time Segments“, In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2013.

Randomized Ensemble Tracking

Software for tracking by detection via classifier ensemble, where the ensemble weight vector is updated in a Bayesian manner. Described in:

Qinxun Bai, Zheng Wu, Stan Sclaroff, Margrit Betke and Camille Monnier. Randomized Ensemble Tracking, In Proc. of the IEEE International Conference on Computer Vision (ICCV), 2013

Saliency Detection: A Boolean Map Approach

Software for saliency detection, which can be used for both eye fixation prediction and salient object detection. Described in:

Jianming Zhang, and Stan Sclaroff. Saliency Detection: A Boolean Map Approach“, In Proc. of the IEEE Internetional Conference on Computer Vision (ICCV), 2013.

Hierarchical Tracker Software

Software for tracking by detection, where the trackers are maintained as a hierarchy. The implementation is provided for pedestrian tracking in crowded scenes. Described in:

Jianming Zhang, Liliana Lo Presti and Stan Sclaroff , “Online Multi-Person Tracking by Tracker Hierarchy“, Proc. Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), 2012.

Camera Canvas Software

Camera Canvas is assistive software for people with severe motion impairments.  Described in:

C. Kwan and M. Betke. “Camera Canvas: Image editing software for people with disabilities.” In Proc. 14th International Conference on Human-Computer Interaction (HCI International 2011), Orlando, Florida, July 2011.

Multiplicative Kernel-based Detector Families Software

Software for learning object detectors that can also estimate the value of a latent parameter (state). Described in:

Quan Yuan, Ashwin Thangali, Vitaly Ablavsky and Stan Sclaroff, Learning a Family of Detectors via Multiplicative Kernels, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, No. 3, pp 514-530, 2011.

Hand Pose Estimation Software

Software for estimating the pose and shape of human hand in an input image.

Head Tracking

Software for estimating the 3D rigid motion (translation, rotation) of a person’s head from video.

Active Blobs

Software for region-based tracking of 2D deforming shapes in video.

 

Anonymous FTP-able Image Data

Layered Graphical Models dataset

This dataset contains videos used in testing tracking using the layers of graphical models framework, used for evaluation in:

Vitaly Ablavsky and Stan Sclaroff, Layered Graphical Models for Tracking Partially-Occluded Objects, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33(9):1758-1775, 2011.

Vehicle tracking sequences

This dataset contains videos used in testing tracking using the multiplicative kernel framework, used for evaluation in:

Quan Yuan, Ashwin Thangali, Vitaly Ablavsky and Stan Sclaroff, Learning a Family of Detectors via Multiplicative Kernels, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, No. 3, pp 514-530, 2011.

Multi-face tracking test videos

This dataset contains videos used in testing tracking of multiple faces in video using the multiplicative kernel framework, used for evaluation in:

Quan Yuan, Ashwin Thangali, Vitaly Ablavsky and Stan Sclaroff, Learning a Family of Detectors via Multiplicative Kernels, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 33, No. 3, pp 514-530, 2011.

Hand shape image data

These datasets contain hand shape images used in evaluation of algorithms described in:

Quan Yuan, Ashwin Thangali, Vitaly Ablavsky, and Stan Sclaroff, Multiplicative Kernels: Object Detection, Segmentation, and Pose EstimationProc. IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR), June, 2008.

Dynamic background sequences

This dataset contains videos used in testing the dynamic background modeling system described in:

Jing Zhong and Stan Sclaroff, Segmenting Foreground Objects from a Dynamic Textured Background via a Robust Kalman Filter, Proc. International Conf. on Computer Vision (ICCV), 2003.

Hand image database with ground truth

This dataset contains 107,328 images of a realistic computer graphics rendering of realistic human hand model. Ground truth for each image is available, thus enabling quantitative evaluation of articulated pose esimtation algorithms. More than 200 real images of hands are also distributed with this dataset. The dataset was used in evaluating the system described in:

Vassilis Athitsos and Stan Sclaroff, Estimating 3D Hand Pose From a Cluttered Image, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Vol. 2, pp 432-439, 2003.

 

Labeled video sequences used as ground truth in skin color segmentation experiments

To evaluate the performance of our color segmentation system, we collected a set of 21 video sequences from nine popular DVD movies. Collected sequences vary in length from 50 to 350 frames; most, however, are in the 70 to 100 frame range. All experimental sequences were hand-labeled to provide the ground truth data for algorithm performance verification. This data was used in evaluating the system described in:

Sigal, L., and Sclaroff, S., Estimation and Prediction of Evolving Color Distributions for Skin Segmentation Under Varying Illumination, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR), June, 2000.

Video sequences of American Sign Language (ASL)

Video of ASL sentences, taken with multiple synchronized digital cameras to capture different views of the subject. Collected in The National Center for Sign Language and Gesture Resources.

 

Over 70 video sequences and ground truth used in evaluation of 3D head tracking

The directories contain over 70 extended video sequences used in head tracking experiments. Ground truth for position and orientation of the head was acquired using a magnetic sensor (Flock of Birds sensor). The data was used in evaluating the system described in:

La Cascia, M., and Sclaroff, S., Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Robust Registration of Texture-Mapped 3D Models, IEEE Trans. Pattern Analysis and Machine Intelligence(PAMI),Vol. 22, No. 4, pp 322-336, 2000.

Images databases used in deformable shape-based segmentation and retreival experiments

These are images databases used in experimental evaluation of deformable color region grouping methods for shape-based retrieval. The data was used in evaluating the system described in:

Liu, L., and Sclaroff, S., Deformable Shape Detection and Description via Model-Based Region Grouping, Proc. IEEE Conf. on Computer Vision and Pattern Recognition, (CVPR), June, 1999.

Image database used in shape-based retrieval experiments:

This tar file contains 86 color images of animals (rabbits and fish), and 63 monochrome images of hand tools. Support mask images are also provided. These images were used for experiments reported in:
Sclaroff, S., Deformable Prototypes for Encoding Shape Categories in Image Databases, Pattern Recognition, 30(4):627-642, Apr., 1997.