Assignment 2

Anant Raj              10086

PAPER NAME - A Generative Model for 3D Urban Scene Understanding from Movable Platforms

Author - Raquel Urtasun , Andreas Geiger and Martin Lauer

Introduction :

Scene understanding has been an active area of research in the field of computre vision due to the obvious reasons. Without giving the intelligence of understanding the 3D scenes to the system , an autonomous syatem and manupulators can't be designed successfully. So we need to design a general approach to understand the 3d scenes. But still we lack a model which can be generalized in each and every scenario. People have designed the models for detectiong objects and scene segmentation , but they are too specific and can't be generalized. In the paper[1] "Object Detection with Discriminatively Trained Part Based Models" by "Pedro F. Felzenszwalb, Ross B. Girshick, David McAllester and Deva Ramanan" an object detection approach based on mixtures of multiscale deformable part models is described. Scene understanding by segmenting the scene into semantic labels has been described in the paper [2]"Segmentation-Based Urban Traffic Scene Understanding" presented by "Andreas Ess, Tobias Muller, Helmut Grabner and Luc van Gool". They have described the patch wise image classification approach and extracted the features from obtained meta representation and finally applied classifiers to classify the various scenes. But these models are not transfererable to new scenes and works well only in the context they are designed for. The level of understanding generated by object detection and image segmentation without higher level reasoning tells relatively little about the underlying scene structure[3] ( Described in the paper "A Generative Model for 3D Urban Scene Understanding from Movable Platforms" itself). So in this paper presened by "Urtasun , Andreas Geiger and Martin Lauer" a principled generative model of 3D urban scenes is prposed that takes into account dependencies between static and dynamic features. Using the above proposed approach they are able to generate the 3D layout of the road, detect the void or non void places, objetcs and keep account of traffic activities as well.

Urban Scene Understanding, Learning and Inference :

This section contains a brief about the probabilistic generative models for 3d scenes, learning and inference. Any image of urban scene basically contains two types of environment , one whch is static (i.e. roads, buildings, shops) and the other one dynamic or moving objects (i.e. humans , cars etc). Static features are represented as 2D grid depending upon the place is occupied or vacant and Dynamic features as 3D flow which is composed of 2d location and velocity[3]. All these dependencies are parametrize in certain number of random variables (i.e. angles between neighboring streets, width of the street etc.). Then we define the joint distribution function of these random variables and model various (independent or liklihood) data as different distribution function. So in training phase training datasets consists of various short sequences, where for each sequence no. of scenes, no. of streets , angle between consecutive streets , global rotation and width of streets are labeled with the help of GoogleMaps images[3]. Also during learning the static and dynamic features can be observed easily but rest of the parameters is inferred in the inference phase. Inference for each scene is performed independently[3]. Since, depending on no. of streets, the model has different number of parameters, so the direct comparison between two model can't be done. This may lead to wrong result beacuse we don't know the probabilty measures for both the models. That's why two additional states are introduced using some other proposal distribution. The details of this algorithm can be found in the literature[4]. To switch between different sizes of the model moves were implemented that add and remove adjacent roads[3].The acceptance ratio of such a move is just the ratio between the posterior probability of the new parameter set compared to the current one. This can be observed by the equation given in the paper[3] and literature on MCMC[4]. For increasing the no. of streets by one (jumping to next step), a random angle between two consecutive states is selected and divided into two parts and this can also be removed that's why it's called a reversible jump from one state to other state. So the described approach is building a model of the 3D scene.

Experimental Results and conclusions :

Presented model is generative so sampling can be done from it. Since the described approach is very general and takes into account both the dynamic features and static features. So by putting constraints on dynamic features static scene can be anaysed and vice versa. They have described the result of camparison between the GP regression and the presented approach[3]. Also this approach can be used in improving the object detection result and semantic scene segmentation due to it's geometrical and topological constraints. It is more likly a car to be on the road in place of home or indoors. Same case goes with the scene also.

References :

[1] P. F. Felzenszwalb, D. A. McAllester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In CVPR, 2008.

[2]A. Ess, T. Mueller, H. Grabner, and L. van Gool. Segmentation-based urban traffic scene understanding. In BMVC, 2009

[3]Raquel Urtasun , Andreas Geiger and Martin Lauer. A Generative Model for 3D Urban Scene Understanding from Movable Platforms.

[4]P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995

Link for the video to get a feel how this approach works.