Project Report by Satyendra Kumar Gupta, S.Prashanth Aditya

Artificial Intelligence ME 768 Jan-Apr 2000

RECOGNITION OF CARTOON CHARACTERS

Submitted by:

Satyendra Kumar Gupta (satyend@iitk.ac.in)
S.Prashanth Aditya (aditya@iitk.ac.in)

IIT Kanpur: April 2000

Motivation
Example: Some-Sample-Task
Past Work
Sample Input-Output
Methodology
Results
Limitations and Conclusions
On-Line Links
Bibliography
Documentation of functions used and usage of the code
Acknowledgements

Motivation

The idea behind choosing this project was to develop a natural language "commentary" from a cartoon strip. The project would then involve the areas of vision (object recognition) and NLP (text generation). This, it was thought, would be a step in helping blind people and children understand and appreciate cartoons better.
The amount of work involved in such a venture would have been too much to complete in a semester. So we decided upon the simpler (relatively) subproblem of recognising a cartoon character in a cartoon strip.
We chose this out of the two subproblems at hand because this work can be extended to human objects and might be of considerable help, in that case, in areas like criminal investigations etc.

Back to contents

Example:

Consider the image below. We assume the standard set of Tintin characters to be the following:- Tintin, Snowy, Captain Haddock, Professor Calculus, Nestor and the Thomson and Thompson twins. Though we have developed the project to recognise Tintin alone, the methods used could be modified to capture any of these characters from the input image.

Here the program should be able to identify Tintin and Captain Haddock (the modern one, not Sir Francis Haddock). Another example could be

Here, we should be able to recognise Tintin, Snowy, Captain Haddock and Professor Calculus.

Back to contents

Past Work - Using the unique features of a cartoon character to identify it

Past work was focussed on isolating distinct features of a character like its hair and images were scanned using the "dark hair on light background" concept etc. In addition to this, [Syeda:1999] gives a method of scanning the entire image as well. [Yamada/Watanabe:1999] discusses some other aspects of cartoon character recognition, notably working with "characteristic pixels". This is necessary because objects could be in any orientation in the image. These characteristic pixels are crosses, junctions and the like. The hair regions are characterised by rectangular and principal moments and then advantage is taken of the fact that in most cases, the body is perpendicular to the hair region. In multiple frame cartoons, a distance function based on template similarity between objects (in each image) is used to identify individual characters.
There are quite a few papers on individual topics of our concern in this project, like template matching, segmentation of images by colour etc. [Lai/Chin:1995], for instance, deals with the extraction and modelling of deformable contours in an image. The significance of this paper with respect to our project will be illustrated in a moment.
John R. Smith's thesis on Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression discusses various methods of extracting regions from images using colour/texture Histogram-Back-Projection techniques.

Back to contents

Sample Input and Expected Output

The input is an image. The strip we have chosen for this project is Tintin. We have developed our program to recognise Tintin. But the limitations and conclusions section will also explain how this can be extended to other characters as well.
For the input image, our goal would be to specify whether it contains Tintin or not. This will be made known to the user at the the screen itself.
The output will be an image of size equal to the input image and witha white background. It will contain the edges of the character we have recognised as Tintin.
The following example will make this clear.

Back to contents

Methodology

The methodology basically involves two steps: Segmentation of the images based on colour and Template based Feature Extraction. It revolves around the fact that Tintin (who is the character we are trying to identify), has a particular complexion and a marked feature as regards his hair.
For the segmentation part, we have chosen John Smith's thesis. Regions having a specified colour can be extracted by Histogram-Back-Projection. The thesis describes a variety of algorithms working on the basis of this method. Local histogramming achieves the best results.
Feature extraction helps to identify Tintin from a host of portions in the image which are detected in the first step. [Lai/Chin:1995] describes methods of extracting deformable contours from noisy images. It should be noticed that Tintin's hair shape, though unique, does change shape and orientation-wise across images. Therefore the chosen template should be able to deal with this. Further, scanned images are very noisy, as compared to other images available from the web. So the extraction algorithm should be good enough to detect the shape in not-so-clean images.
The following paragraphs explain the algorithms adapted from both the works mentioned above.

There is something we would like to point out at this stage. So far, we were working with a particular algorithm for the segmentation which basically works the same way Local Histogramming HBP works (though we didn't know it until very very recently!), the difference being that instead of comparing a pixel in the target region with a group of pixels which make up the model image, we used to compare each individual pixel with a value which we arrived at through some experimentation. So our algorithm was not very robust, and importantly, we couldn't do much about the noise we got due to scanning.

Histogram-Back-Projection by Local Histogramming to detect colour regions in an image.

HBP determines the most likely spatial position of a given histogram in an image. According to John Smith, it is defined as "a class of algorithms designed to detect within images the feature histograms that are similar to a given model histogram".
Combined with what is known as the Visual Feature Library (VFL), HBP constitutes a powerful solution for extracting regions from a large collection of unconstrained images.
HBP consists of four steps: back-projection, smoothing, thresholding and detection. The Local Histogramming method combines the smoothing and back-projection steps. It works as follows: For each point in the target image a local neighbourhood histogram is computed and its distance to the model histogram determines whether each image point belongs to the model. (The model image provides the model histogram. The model images constitute the VFL.)
The concept of "distance" between histograms is simple to understand. If there are two histograms (considered points in a histogram space), there is a number called the distance between them which is non-zero if they are different and zero if they are the same. The distance also follows the rules of commutativity and symmetry as well as the triangle inequality (given histograms 1,2 and 3 D(1,2)+D(2,3)>=D(1,3)).
There are many ways to calculate the distance. Anything is valid as long as it conforms to the above laws. We use the Minkowski-form distance, defined as follows: Let and be the query and target histograms, respectively, then

, where r could be 1 or 2. We choose r = 1 - essentially we get the same results.
John Smith also presented two efficient strategies for the computation of the histogram queries. The first, query optimized distance (QOD) computation, prioritizes the elements of the computation in the order of largest significance to the query. By prioritizing the computation in this way, the query is computed more efficiently. Furthermore, by stopping the computation process to only approximate the actual histogram distances, the query response-time is reduced dramatically without significantly decreasing the retrieval effectiveness.
As far as the histograms are concerned, the range of 256 intensities is divided into 8 bins of 32 each. The bins are then normalised according to the following function:, where again r = 1 or 2 and we choose r = 1.

This is a feature vector representation of discrete-space, two-bin normalised histograms (with r = 1).

Minkowski form distances compute differences between the same bins, not even the same kind of bins. As John puts it, in Minkowski form distance computation, deep red is as different from red as it is from blue.

NOTE:- ALL THESE IMAGES WERE TAKEN FROM JOHN SMITH'S THESIS PAGE.

Here, we would like to present a comparison of the algorithm we had used earlier and this one. The first image that you see is a scanned image of Tintin - an example of the input our program would accept (in ppm format, of course). The second image you see is the segmented version of the first the way we were doing it earlier. The third is the result of the new algorithm.
The results page will show better examples of segmented images. These were achieved by using some other simle algorithms in conjunction with the HBP algorithm. This is schematically represented in the image shown below these three.

The salient features of our previous algorithm:

The intensity function used was sqrt(r*r+g*g+b*b)/3.
Each pixel was checked to see if it belonged to a particular range of intensity values, as generated by the function above, and then selected or not selected based on this test. Those points which didn't belong to this range were replaced by white.

The salient features of the new algorithm we have used:

The intensity function concentrates more on the red and green components than the blue, simply because Tintin's face is stronger in red and green than in blue. In fact, the intensity function is value = 0.5*red + 0.4*green + 0.1*blue.
Each point is the center of a small block known as the model image and the distance between the histograms of the model image and the target image region is considered before selecting/not selecting the pixel.

The first thing that strikes as the difference between the two images is that the latter doesn't give as much information as the former in terms of edge-detail. What we essentially get is a rough idea of where in the image we'll get the colours we are looking for. But another thing to be noticed is that noise is completely eliminated in the latter image.

Limitations and Conclusions

Due to problems like the difference in the quality of scanning across images, and "differently coloured" Tintin's in different comics, the HBP algorithm works well (mostly) only in the case when the model images in the VFL are taken from the same image which has to be processed. This defeats the purpose of the project to some extent, but it also says that the problem is something about which we can do nothing much about, unless of course we think of an alternative method of identifying Tintin other than the Segmentation-Feature Extraction routine.
The HBP algorithm also detects all regions of the image which have roughly the same colour distribution as the model image. What this means is, since all the main characters have the same colour (facial), this method of segmentation can be used as a primary step to detect almost any major Tintin character (with the exception of Snowy and the one odd cats in Marlinspike Hall!!).
Also, this algorithm works irrespective of the background colour.
After seeing the results page and the cases in which the HBP method doesn't produce very satisfying results, we feel that removing the dialog boxes might help.
We also did not have an extensive VFL. We had only one or two model images for each image and we checked our code on those. What we should have done was to have a model image which was the mean of several model images taken from many images and then applied that mean model and check for the distnace between histograms such that it lies between the values mean-standard dev. and mean+standard dev., where the standard deviation is also calculated from many images. We expect better results with this.
We were also not able to implement the feature extraction part. Character recognition is all about verifying as many features as possible to get closer to recognising the required character. So feature recognition would have been a good final step in the recognition procedure. This is especially true in this case because almost all characters in the Tintin books have more or less the same facial colour. The importance of feature extraction can br gauged by the results obtained under the "unsuccessful search" section in the results page.

To the results section

Back to contents

On-Line Links

Back to contents

Annotated Bibliography

@Article{Syeda:1999,
author = { T F Syeda-Mahmood.},
year = { 1999},
keywords = { IMAGE TEXTURE SEGMENTATION},
institution = { IBM Almaden Research Centre,CA},
journal = { Computer Vision and Image Understanding},
title = { Detecting Perceptually Salient Texture Regions in Images},
month = { October},
volume = { 76},
pages = { 93-108 },
e_mail = { stf@almaden.ibm.com},
annote = {
	This paper reports on the techniques and the heuristics applied and used to
	group the salient texture regions in the natural scenes.It works out the
	possible presence of distinctive pattern localised to some arbitrary
	position of any given image.The saliency of the texture region can be
	inferred by analyzing the interplay between the bright and dark region
	against a gray background. This condition is met by using convolution
	like-LOG(Laplacian of Gaussian).One of the problem faced by the
	author is the selection of the relevant area to which analysis is to be
	applied.The heuristic proposed is to assign weight functions to different
	visible pattern and obtain the relevant portion by the MST(minimum
	spanning tree) of the region.

	An interesting part of the paper is texture segmentation method by
	moving window analysis and LP-spectrum difference measure .It first
	divide the image into different overlapping window and by series of
	iteration it recontructs the missing portion of the boundary fragments,
	thus in a way parsing the whole image.

	One important aspect of the paper which remained unclear is choice
	of weight function based on "psychophysical experimental data",
	what I propose is to use use fuzzy logic.Each pattern type can be mapped
	to some prototype ,and appropriate parameters can be obtained using
	fuzzy laws and principles.Over all the paper is well presented and the
	end results are well exemplified
}

@article{Yamada/Watanabe:1999,
author = { Yamada, T. and Watanabe, T.},
year = { 1999},
keywords = { CARTOONS-RECOGNITION EXTRACTION PERSON-SLOPE VECTORIZATION IDENTIFICATION },
institution = { Graduate school of engineering,Nayoya University},
title = { Identification of Person Objects in Four-Scenes Comics of Japanese Newspapers},
booktitle = { Intl. wkshp. on Graphics Recognition,GREC`99},
pages = { 152-159},
email = { {yamada,watanabe}@watanabe.nuie.nagoya-u.ac.jp}
annote = {
This paper describes a method to identify a comic character. The subjects which extract meaningful data
from paper based documents and make it possible to manipulate the extracted data as operative objects in
electronic formed documents are looked upon as interesting and worthy issues, and have been used for various
kinds kind of documents until today. In these type of comics some objects are illustrated in different
scenes, but these shapes and sizes, locations, directions, whole-part relationships, and so on are
tremendously deformed and changed. The analyzing method is based on the distance distribution among
composite elements.
With respect to extraction and identification of person objects, the basic recognition strategy focuses on
the hair area continuously. Here area has been defined as the area of continuously black painted pixels
whose number is more than experimental threshold value. The next processing stage is to extract the areas
related to person objects with respect to the hair areas in the previous stage. In this case it is effective
to determine the searching directions first in order to extract person objects. The characteristic pixels
are looked at in handwriting line segments, which assign some remarkable features to drawn lines in
directions structure and so on.
Using the characteristic pixels, handwriting line based drawings are vectorized. Basically, two
neighbouring characteristic pixels in one handwriting line segment are transformed into one straight
line segment. The final processing in the extraction phase of person objects is to select the vectorized
line segments in the estimated person region and distinguish only reliable line segments as composite
elements of person objects. In the identification phase of the same person object, the objective is to find
out the same person object from four different scenes.
}

@Article{Lai/Chin:1995,
author = { Lai, Kok F. and Chin, Roland T.},
year = { 1995},
keywords = { DEFORMABLE-MODEL FEATURE-EXTRACTION},
institution = { },
title = { Deformable contours: modeling and extraction},
journal = { IEEE transactions on Pattern Analysis and Machine Intelligence},
month = {},
volume = { 17 (11), 10841090},
pages = { 28},
e_mail = {},
annote = {
	This paper considers the problem of modeling and extracting arbitrary deformable contours from noisy
	images. The authors propose a global contour model based on a stable and regenerative shape matrix,
	which is invariant and unique under rigid motions. Combined with Markov random field to model local
	deformations, this yields prior distribution that exerts influence over a global model while allowing
	for deformations. The authors then cast the problem of extraction into posterior estimation and show
	its equivalence to energy minimization of a generalized active contour model. The paper discusses
	pertinent issues in shape training, energy minimization, line search strategies, minimax regularization
	and initialization by generalozed Hough transform. Finally, experiment results are presented and a
	performance comparison vis-a-vis rigid template matching is conducted.

	The major problem with this paper is that it uses a few structures and term without really making it clear
	to the reader how they are actually calculated. For instance, the concept of variance at a point (with regard
	to deformations of the contour) is used throughout the paper. This as such can be intuitively understood by
	the reader but there are no details like the points in the image which were considered to calculate this.
	This poses a very big problem if we have to implement the algorithms discussed in the paper, which I should say
	really good.

					- S.P.Aditya, 09/04/2000
}

NOTE:- The CVPR/PAMI/thesis versions [Lai/Chin:1995] can be downloaded in zipped/normal versions from Cora's pages.

Back to contents

Documentation of libraries used and usage of the code

We have worked with images of the type .ppm and .pgm throughout the project. So we have used the libraries "ppm.h", "pgm.h" and "pbm.h" and linked them to our code, which is written in C.
The input image has to be converted to the ppm format. Then the program is compiled using the following command line command:
$ cc -lpgm -lppm -lpbm -lm test.c
The name of our program is test.c. This program deals with the HBP and segmentation.
Then the following command is executed at the prompt:
$ a.out imagename.ppm
Here, imagename is the name of the input image.
An image named model_image.ppm is used as a single image from the VFL in the program. As of now, this model image is taken from the same image and is also converted into a .ppm file. This could not be helped because of the non-uniformity in facial colour across images.

Now a listing of all the functions/subroutines used in the code is given, along with a brief description of what they do:

Gaussian - Calculates the Gaussian function value. Arguments are the variable and sigma. The output as well as the arguments are all floating point variables.
This function is used by the edge detection routine, which uses a Canny filter.
Gprime - Calculates the derivative of the Gaussian. Arguments are again the variable and sigma. The data type of the arguments and the output is again floating point.
This function is also used by the edge detection routine.
filter_gen - Generates the Canny filter. Arguments are the scale, the direction and the sigma. The parameter direction specifies which filter should be generated - vertical or horizontal. The function returns a void. The last last argument given above is a floating point data type whereas the other two are integers.
This function is also used by the edge detection routine.
convolve - Convolves the image matrix with the Canny filter. The arguments are the scale (integer), the input image matrix (passed as gray**), the row and column sizes (integers both) and the direction (integer). The direction specifies the filter (horizontal or vertical) with which the image has to be convolved.
This is used by the edge detection routine.
normalize - This function normalizes the histogrmas used in the HBP method. It takes in an array of floats as argument. There are 8 elements in the array - each for each of the 8 bins into which we divide the entire histogram range of 256 intensities into. The function returns nothing.
This is used in producing the segmented_image.
initialize_v - This function initializes all the bins of the histogram to 0. Takes in the array of floats and returns nothing.
The function is used by the routine which generates the segmented image.
initialize_difference - This function initializes all the differences between the corresponding bins of both the histograms used (model and target image) to zero. Since this difference is calculated at each point in the image, each point has an associated difference value which this function initializes to 0. So this function takes in a two dimensional floating point matrix equal in size to the input image matrix and initializes all its elements to 0. The function returns nothing.
This function is used by the routine which generates the segmented images.
use_formula - This function calculates the difference between bins based on the distance formula and returns its modulus (float). The arguments are two one-dimensional floating point arrays.
This is also used by the segmented image-producing routine.
check_threshold - This function checks the difference associated with each point in the image with the threshold and accordingly paints the corresponding point in the segmented image matrix white or black. The function takes in as arguments the differences matrix (two-dimensional floating point), and two gray two-dimensional matrices (one is the input image and the other contains the segmented image). The function returns nothing.
This is used by the segmented image-producing procedure.
find_difference - This function gets the difference between the model image histogram and the target image portion histogram. It takes in as arguments four two-dimensional gray matrices - for four grayscale images. One is the model image, one is the input image, one is an image of the same size as the input image and whose points are black or white (according as whether the corresponding points in the input image have their r-value greater than both the g and b-values or not), and the last one is a completely white image. The last two are used for subsequent passing to other functions which are called by this function. This function returns nothing.
do_edge_detection - This function does all the work required for edge detection. It calls all those functions listed above which are used for detecting edges. It takes no arguments and returns nothing.
remove_extra_portions - This function checks where the segmented portions are in terms of the range of points and organises a search for black points (by calling check_with_other) in the edge detected image in the range obtained. The function takes no arguments and returns nothing.
check_with_other - This function does the actual searching within a certain range of the point passed to it by the function remove_extra_portions. The latter initializes all the variables required by this function.
This function takes in as arguments three two-dimensional gray matrices (one is the edge-detected image, one is the segmented image and the third is a fresh image which holds the output ultimately) and two integers - the coordinates of the point around which the search has to be performed.
main - This function takes in one argument and that is the name of the input image. This is given at the command prompt. It also does all the other work that is required for the rest of the program to run.

Back to contents

Acknowledgements

Dr. Amitabha Mukherjee

Dr. Rahul Sukthankar

Back to contents

Amitabha Mukerjee

[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]

Contents

Motivation

Example:

Past Work - Using the unique features of a cartoon character to identify it

Sample Input and Expected Output

Methodology

Histogram-Back-Projection by Local Histogramming to detect colour regions in an image.

Limitations and Conclusions

On-Line Links

Annotated Bibliography

Documentation of libraries used and usage of the code

Acknowledgements