Layered representation of video (such as in [1]) is a useful mid-level video processing tool. Likewise several mothods of layered representation, we begin with oversegmentations and group them based on their geometric distances in the levels of hierarchies to form distinguishable affine motion layers.

The oversegmentations can be of different types, either block-level or at superpixel level (such streamGBH in [2]) or based on any kind of clustering method such as stability based segmentation [3] approach. At each segment, affine a 6-dimensional affine parameter is estimated from 2-D optical flow of constituent pixel locations using RANSAC [4].

We aim at estimating motion layers by merging supervoxels or over-segmented regions having apparently little different spatio-temporal feature. We propose a criterion to use for merging two regions based on affine flow. The geometric distance between affine regions uses the notion of warping error based dissimilarity, directed divergence. Directed divergence of region #k from region #i div(k,i) and directed divergence of region #i from region #k div(i,k).

We consider first fit or local greedy way as the region grouping method. Our assumption is if region i, region j, and region k all are to be merged; the merging criteria for all of the pairs (i,j), (j,k) or (i,k) would be satisfied. No matter with which pair we start, eventually we would come to the same grouping at the end. The directed divergence and distance metric concept has been visualized in the Figure. Directed divergence metric is not symmetric. After the region merging, the significant errors are found especially near the object boundary. To overcome this problem, we explore Markov Random Field based smoothing as described in [5]. As we know the number and parameters for candidate motions at each hierarchy level, we apply BVZ [6] algorithm to achieve smoother segments. In the MRF framework, we intend to optimize both the pixel-wise warping error satisfying a prior smoothness constraint.

We see different aspects of the motion scene in terms of granularity of distinguishable motions present in the layers of hierarchy. Different regions following affine motions are grouped to produce smaller number of foreground objects as the hierarchy level increases.

After the region merging, the significant errors are found especially near the object boundary. To overcome this problem, we explore Markov Random Field based smoothing as described in [5]. As we know the number and parameters for candidate motions at each hierarchy level, we apply BVZ [6] algorithm to achieve smoother segments. In the MRF framework, we intend to optimize both the pixel-wise warping error satisfying a prior smoothness constraint in pixel grid-based graph.

Enforcing ** temporal consistency ** has two paradigms. We use a per-frame discretized flow histogram and thus the association of a motion segment from previous to current frame is not inherent. To address the issue of enforcing temporal consistence in simple and effective way, we use the segmentation from backward warping from the previous frame-pair as an initialization for the segmentation process (forward warping) of current frame-pair. Thus, the question of associativity between segments over time becomes only about an associativity of segments from forward warping to backward warping of a frame-pair. The best matching segment, in terms of overlapping area, in the second image (wrt the warped segment in the first image) establishes segment correspondence between forward and backward warped motion segmentations.

Our paper on motion layer segmentation has been accepted for publication WACV 2014 . The download link: paper . Source Code can be found in GitHub

Results on other videos - at level 3

[1] J. Y. A. Wang and E. H. Adelson, "Representing moving images with layers," IEEE Trans. Image Processing, vol. 3, pp. 625–638, Sep 1994.

[2] A Rabinovich, T Lange, J Buhmann, and S Belongie, "Model order selection and cue combination for image segmentation," in CVPR, 2006.

[3] Martin A Fischler and Robert C Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol. 24, no. 6, pp. 381-395, June 1981.

[4] A Rabinovich, T Lange, J Buhmann, and S Belongie, "Model order selection and cue combination for image segmentation," in CVPR, 2006.

[5] Josh Wills, Sameer Agarwal, and Serge Belongie, "A feature-based approach for dense segmentation and estimation of large disparity motion," International Journal of Computer Vision, vol. 68, no. 2, pp. 125-143, 2006.

[6] Yuri Boykov, Olga Veksler, and Ramin Zabih, "Efficient Approximate Energy Minimization via Graph-Cuts," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 23, 2001.