Paper Review

Weakly Supervised Structured Output Learning for Semantic Segmentation

(Vezhnevets, A. and Ferrari, V. and Buhmann, J.M)

By : Anurag Pandey



Stated Objective:

The authors’ objective is to create a system which will predict a class for every pixel of a given image by training the system in a weakly supervised setting. In this setting, the system is provided only with classes that a trainingimage contains, and not their respective locations in the image. The standard approach to train the system in a fully supervised setting in which every pixel is manually labelled. But since producing this annotation is very time consuming, the authors explain the urge to have a weakly supervised method. .

Techniques used to solve the problem in hand:

The approach was to enable segmentation algorithm to use multiple visual cues in such a setting, analogous to what is achieved by fully supervised methods. But there was a problem; it is very difficult to assess the relative usefulness of different visual cues from weakly supervised training data. Now, authors defined a parametric family of structured models, each model weighing the visual cues in a different way. With this, the problem got reduced to a model selection problem. Choosing a best model was a hard optimisation problem as it has no analytic gradient and multiple local optima. Thus, they formulated it as a Bayesian optimisation problem and proposed an algorithm based on Gaussian processes to efficiently solve it. .

Innovation:

To get an improved representation of the appearance model of the semantic classes, there were two major requirements to be fulfilled- a) on one hand, the appearance model had to be flexible and should provide a diverse set of visual features,

b) on the other hand, learning and prediction must be efficient, because at the time of model selection, they had to be performed at every optimisation step.

To accomplish both requirements, they proposed the Extremely Randomised Hashing Forest (ERHF) which is capable of mapping almost any feature space into a sparse, binary representation. This choice enables one to use a simple and efficient Naive Bayes Model, while still leveraging diverse feature sets. This whole concept of ERHF is potentially very useful in all kinds of representation of models of semantic classes.

Validation through experiments:

They performed experiments on the LabelMe subset consisting of 2488/200 images. The results were such that their model outperformed the best existing weakly supervised approach. Even it outperformed the fully supervised TextonBoost and reaches a performance comparable to the modern fully supervised non-parametric method.

Performance bounds:

The very recent state-of-the-art fully supervised method is still very efficient compared to the introduced method. Also, on the training set, they could recover superpixel labels with a low per class accuracy of 56% only.

.

.