Research: StreamGBH+


There are three different paradigms in video segmentation. First is frame processing in which each frame is independently segmented, but no temporal information is used. This method is fast but the temporal coherence is poor. Second is 3D volume processing that represents a model for the whole video. It is bi-directional multi-pass processing. The results are best, but the complexity is too high to process long videos and does not cater to the need for streaming videos. Stream processing processes the current frame only based on a few previously processed frames. It is forward-only online processing, and the results are good and efficient in terms of time and space complexity. The state-of-the-art streaming segmentation (streamGBH) outperforms other streaming methods and competitive with full-video hierarchical methods. The streamGBH in libsvx implements this video segmentation approximation framework. We, have demonstrated significant improvement in segmentation accuracy and quality by augmenting streamGBH with motion feature and bilateral filtering. We call our approach streamGBH+.


As a pre-processing step, we apply bilateral filtering as an edge-preserving smoothing to improve segmentation. Along with color similarity, we consider motion similarity of voxels and influence of motion direction on graph connectivity. Use of dense optical flow considerably improves segmentation results. Firstly, instead of connecting a voxel (i,j,t) to its immediate 9 neighbors (i+m, j+n,t-1), m,n ϵ {-1,0,+1} in the previous frame, we connect it to its 9 neighbors along the backward flow vector (u,v), i.e. (i+u(i,j) +m, j+v(i,j)+n, t-1). This is a generalization of prior grid-based volumetric approaches which can only be achieved using a graph representation. Secondly, we use optical flow as a feature for each region during hierarchical segmentation. As optical flow is only consistent within a frame, we use a per-frame discretized flow histogram. Unlike [Grundmann et al CVPR 2010] which discusses SIFT-like (with respect to angle) motion feature representation, we propose simpler representation of two histograms of horizontal and vertical component of optical flow field. The benefit of this simpler approach is to distinguish motions with same direction but different magnitude. Matching the flow-descriptors of two regions then involves averaging the χ2 distance of their normalized per-frame flow-histograms over time. We combine the χ2 distance of the normalized color histograms dc ϵ [0,1] with the χ2 distance of the normalized flow histograms df ϵ [0,1].

original video frames
super-pixels at level 1
super-pixels at level 10
super-pixels at level 15



We use the recently published benchmark dataset (ChenXiph.org) and video segmentation performance metrics [Xu et al CVPR 2012] for the quantitative experiments. This video dataset is a subset of the well-known xiph.org videos that have been supplemented with a 24-class-semantic-pixel labeling set (The same classes from the MSRC object-segmentation dataset). In the implementation, we use sequence length 3 in all experiments and thus performance is not expected to be near to full-video processing.

The 8 videos (‘Bus’, ‘Container’, ‘Garden’, ‘Ice’, ‘Paris’, ‘Salesman’, ‘Soccer’ and ‘Stefan’) in this set are densely labeled with semantic pixels and have duration of 85 frames each. This dataset has been used for evaluation for the state-of-the-art StreamGBH method. This dataset allowed us to evaluate these segmentation methods against human perception.

Objective Measures

Proposed streamGBH+ performs better than the state-of-the-art streamGBH for videos with object motions. Overall, streamGBH+ outperforms in the evaluation of metrics such as boundary recall 2D, boundary recall 3D, explained variation. As per Table 1 for the metrics accuracy 2D and accuracy 3D we see no difference for this database overall. Though, the performance of streamGBH+ for the under segmentation metrics is not better than the state-of-the-art, we care for other metrics more at this time because of our future target application on object recognition or 2D-to-3D video conversion.

Our paper on streamGBH+ has been accepted for publication WACV 2014 . The download link: paper.

Source Code can be found in GitHub .

Table 1. comparative object values for chen database.
metricstreamGBHstreamGBH & motionstreamGBH & BLFstreamGBH+
Boundary recall 2D0.440.4420.4510.452
Boundary recall 3D0.470.4820.490.49
Explained variation0.710.720.730.75
Accuarcy 2D0.580.580.580.58
Accuarcy 3D0.560.540.550.55
Undersegmentation error 2D89910
Undersegmentation error 3D18181518


[1] C. Xu, C. Xiong, and J. J. Corso. Streaming hierarchical video segmentation. In Proceedings of European Conference on Computer Vision, 2012.