a) We can use intensity based edge detection algorithms like as in canny edge detector; or texture based edge detection algorithms in case where the difference between intensities on either side of edge is very less. Prior information like intensity and texture of the floor and walls have to known to the bot for good results. b) We can divide the floor into grids. We can use edge detection again to recognize dirt. We can divide the area dirt is occupying on the grid by the grid area giving it a value between [0,1]. c) Yes. Explained in (b) part. d) Robot uses a camera to capture the image. The camera is at height of 30 cm. The camera is unable to capture some area that is just infront of the robot. And, thus we calculate the nearest point to the robot lying on the floor ( it lies on intersection of the line of sight of the camera and the lowermost edge of the image), we effectively know bot's "walking" distance(sqrt X^2 and Z^2) from all the points lying on the LOWERMOST EDGE of the image. We know the "Z" coordinate of all the points on the lowermost edge. We can calculate X using X=x*Z/f, and thus its "walking" distance. As far as extracting 3-D information is concerned like pose of an object, we require more than one image to apply tools like binocular stereopsis.