Research: Semantic Segmentation - Exploring Efficient Inference in Video



We have the video inference framework for meanfield based updates which considers higher order cliques along with the pair-wise potentials. The key idea is to combine the best of two worlds - (1)semantic co-labeling and (2)exploiting more expressive models. We presented this work in 2015 CVPR workshop WiCV and ISOCC 2015. The approach achived state-of-the-art performance on CamVid dataset. Code for Efficient Video Inference is available in github .

Initial Work

Fast and efficient inference in dense CRF has been proposed in [2]. We have implemented efficient inference in spatio-temporal setting, where inference is drawn on a group of video frames together without additional time overhead. Apart from some of our tweaks, this idea is based on [3]. The code for inference in video can be found in github .


The problem of semantic labeling of image sequences is to classify each pixel of an image to the proper semantic categories, like sky, tree, roads, humans etc. We have evaluated ALE (2010) by Oxford Brookes University, TextonBoost (2006) from Microsoft Research and the Darwin library (2012) from Australian National University for multiclass image segmentation. Currently we are looking at improving video multiclass segmentation.

The image in the right shows a semantic labeling of two images from CamVid dataset; the legends are also shown below. In CamVid dataset there are 32 object classes.


We use the MSRC-21, Sowerby and Camvid database. We found ALE performs the best in terms of accuracy.

Experiments on MSRC-21

We found ALE performs the best in terms of accuracy.

comparative performance on MSRC-21

Experiments on CamVid

We look at improving video multiclass segmentation. Below are a set of examples from CamVid dataset of original image sequences, the ground truth sequences and the out sequences from slightly ALE program.

original video frames
semantic segmentations groundtruth
semantic segmentations output


[1] L’ubor Ladický et al. Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction. BMVC 2010.

[2] Philipp Krähenbühl and Vladlen Koltun, Parameter learning and convergent inference for dense random fields, ICML 2013

[3] José M. Alvarez, Mathieu Salzmann, Nick Barnes. Large-Scale Semantic Co-Labeling of Image Sets. IEEE Winter Conference on Applications of Computer Vision (WACV 14)

[4] Vibhav Vineet, Jonathan Warrell, and Philip HS Torr. Filter-based mean-field inference for random fields with higher-order terms and product label-spaces. International Journal of Computer Vision, 110(3):290-307, 2014.