Project Report by Asha and Pooja

Artificial Intelligence ME 768 Jan-Apr 2000

ADDRESS BLOCK RECOGNITION AND BEAUTIFICATION SOFTWARE

Asha Tarachandani and Pooja Nath

IIT Kanpur : April 2000

Motivation
Sample Input and Expected Output
Relation to Past Work
Methodology
Feature Extraction and Character Recognition
Results
Conclusion
On-Line Links
Bibliography
Library documentation

1. Motivation

The purpose of this project is to take English handwritten documents as input, recognize the text and modify the handwriting such that it is a beautified version of the input. Thus the project comprises of two parts - handwriting recognition and beautification.

Handwriting recognition : Lots of work has been done in the field of character recognition but not much for analyzing a complete document. Recognizing the text of a document would be useful in many diverse applications like reading medical prescriptions, bank cheques and other official documents. It will also find uses in detective or police departments in applications like handwriting based person identification, identifying real from forged documents, etc.

Handwriting recognition can be broken into a number of relatively independent modules. After going through several papers and web pages on handwritten word recognition, we thought of various strategies for each of these modules. We then considered the accuracy and efficiency of these strategies independently and as a whole. Also, we came up with a new idea based on feature extraction and relative position matching with the help of directional graphs. Our project is aimed at implementing this idea keeping in mind the strategies which we considered best for an overall accurate, efficient and scalable handwriting recognition software.

Handwriting Beautification : Due to the individual differences in handwriting, a 100% accurate handwriting recognition software has not yet been developed. Human reading ability is dependent on various factors including the knowledge of grammar, context, etc. Hence, it is difficult to build artificial methods to recognize documents as accurately as humans. Also, even a handwriting recognition software with dictionary and context-understanding capabilities cannot guarantee accuracy in recognizing proper nouns. Due to these limitation, we present a scheme for handwriting beautification which modifies the document so that it becomes more presentable while remaining in the individual's handwriting. The output of this scheme can be used by humans to read deformed documents.

The most common reasons for illegibility of English handwritten text are :

Baseline skew
Irregular slant
Irregular sizes of characters
Unintentional spaces between two parts of characters like d which could be mistaken for a c followed by l as in the figure :

Combination of two consecutive characters like or such that they are not recognizable separately as in the figure :

It is possible to process a handwritten document so as to correct the above mentioned irregularities in it. The main aim of beautification is to provide additional assistance to humans apart from the obvious advantages that they already have over computer methods. This will help them in figuring out the text of the document.

The primary aim of this project is to recognize and beautify address blocks. The Joint Entrance Examination office of IIT, Kanpur receives numerous applications every year and they have to send the required documents to all the applicants. For this purpose, the applicants are required to write their addresses in the address blocks provided. A major problem which the JEE office is facing now-a-days is that many posted documents do not reach the applicant due to bad handwriting in the address block. In this project we aim to recognize and beautify the address blocks to relieve this problem. Since handling capital letters is much easier than handling small letters, we limited our domain to capital letters. It is reasonable to make it mandatory for the applicants to fill in the address block with capital letters. Some examples are provided in the following section.

2. Sample Input and Expected Output

Input :
An address block like :

Output:
The text of the address block:
ANIL KUMAR SHUKLA S/O
SHRIRAMBABUSH
Ismail pur (Ameen Gunj)
SoRAM, ALLAHABAD

3. Relation to past work

[Vaidya et al :1999] use a feature-based approach for numeral recognition. They have used a statistical method by assigning weights to each feature and assessing the numerals using these weights. We also use the feature-based approach to recognize the handwritten words. Our approach is different from theirs as words may be written in cursive writing whereas numerals aren't. It is not always possible to break words into characters. Hence we have to use a continuous process of matching the set of features to a database while accounting for the permutations as new features come into view and old ones are discarded. Also, the set of features in the case of alphabets is larger than that in the case of numerals.

[Nicchiotti/Scagliola:1999] have shown some good examples of normalization by removing unwanted variations in cursive handwriting using estimation and compensation of reference lines and slant angle. In the preprocessing of documents for the purpose of recognition and beautification, normalization is an important step to facilitate feature extraction.

[Spitz:1998] use character shape coding process for typed word recognition. They have a small dictionary to which all the words in the document belong. After scanning the words, they are classified on the basis of the regions that they occupy (extending above middle-line, extending below bottom-line or completely between the two). This narrows down the range of possibilities for the word which is then matched against all these possibilities. We had considered this approach but it would have been highly inefficient in our case which is more general as ours is not restricted to a small fraction of a dictionary nor is it restricted to typed documents where the characters are easily distinguishable.

4. Methodology:

Given an address block in pgm format, this project proceeds in the following steps to generate the desired output.

Building feature-graph database
Removing Noise
Thinning : A two step process.
Slant Estimation
Slant Correction
Separating Lines
Separating Characters
Feature Extraction
Matching

These steps are described below:

Building feature-graph database : All the capital letters are broken up into features and a feature-graph is constructed and put into the database. The details of features and feature-graphs are explained in the following section. Note that corresponding to one alphabet, there may be more than one feature-graphs depending on the most prevalent ways in which people write that character.
Removing Noise : Given the address block as an image, the first step is to remove the unnecessary noise in the image. Even though it is a simple step, it is a very necessary step which facilitates the later processings. The technique used for removing noise is the one suggested by [Lee/Gomes:1999].

Thinning : The aim of thinning is to make all written characters (i.e. the black part of the image) one-pixel wide. This is a vital step which forms the basis for recognition as well as beautification. For the purpose of thinning, we first decided to use the algorithm provided by [Datta/Parui:1994]. This algorithm makes sure that continuity is maintained while the image is being thinned. But since this algorithm is not primarily meant for characters i.e. irregularly shaped and irregularly thick objects, it created some weird results for our case. Thus, we created a partial thinning algorithm which finds out the average width of the characters in the image and if this width is more than 3, the partial thinning algorithm thins all the characters depending on their width. After the partial thinning algorithm, we apply the thinning algorithm suggested by [Datta/Parui:1994]. The motivation behind introducing a partial-thinning step and the differences in the results can be seen with the images below:

Partial thinning on the example.

[Datta/Parui] algorithm applied on partially thinned image.

Slant Estimation : After getting a thinned image, it is scanned from left to right and all the vertical/tilted lines in the whole image are detected. The average of slants of these lines gives us the slant of the image.
Slant Correction : Once the slant has been detected, slant correction is applied to the original image. The corrected image is thinned again.

Thinned and slant corrected image

Separating Lines : In this step, the address block is divided into lines and each line is looked at separately.
Separating Characters : Studying a large number of samples, we observed that capital letters are generally written with different characters some distance apart. Hence, we decided to break up a line into characters so as to make the feature extraction and matching simpler.
Feature Extraction : This is the most interesting step. Here, each row of the image is scanned from left to right and top to bottom to figure out the positions of black pixels. Depending on these positions, the features horizontal line, vertical line, left slant, right slant, upper curve, lower curve, left curve and right curve are intelligently detected keeping in mind the variance in these features that may arise due to differences in handwriting. In some cases, when one feature can be mistaken for another, precaution is taken to detect the features in a pre-defined order and mark the visited black pixels so that confusions are avoided.

As the results depends heavily on efficiency of this step, special care is taken to make sure that the features detected are correct. For this purpose, we did a lot of experimentation by playing with the order of feature detecting steps and with the values of the thresholds involved.

For example, the procedure that we followed for detecting a vertical line was by checking if there is an almost vertical line of black pixels whose length is greater than some threshold. The threshold that we earlier used was 5/8 of the height of the line that is being scanned. It gave poor results in cases where a name is written with the first character much bigger than the rest. Thus, we decided that the threshold should be 5/8 of the height of the character being scanned and not the line. This improved feature detection tremendously.

Also, we observed that if we tried to detect slant lines after vertical lines, some of the vertical lines were also being mistaken for slants. To overcome this problem, we improved on the threshold for slant lines as well as marked the pixels in the vertical lines as detected so that they are not detected again.

Matching : This is the recognition step. In feature extraction, we work to get the set of features between two white columns. This set of features may or may not be of a single character. Once the features are detected, the following continous matching algorithm is applied to know the set of possibilities that the set of features may denote.

Continuous Matching

current set

5. Underlying principle of feature extraction and character recognition

Features: These are a predefined set of elementary parts which, when combined in various fashions, constitute all the characters. Some examples of features are vertical line, horizontal line, left curve, right curve, etc. Each feature has an associated set of x and y co-ordinates w.r.t. the figure.

Characters: Depending on the relative position of features in it, a character can be viewed as a Directional Graph of Features called feature-graphs. For example the division of the letter a into features and the corresponding directional graph is as follows :

Thus, we create a database for handwriting recognition which consists of a set of feature-graphs for each of the 26 English alphabets. Further, some characters like a, f, g, etc. may be written in more than one style. This is taken care of by maintaining graphs for all such styles.

6. Results

	Original Image
	Image after thinning
	Image after slant correction
	Partial magnified view of the above thinned image Before slant correction
	After slant correction

	Original Image
	Image after thinning
	Image after slant correction
	Thinning after slant correction.
	Slant Correction after thinning.

	Original Image
	Image after partial thinning
	Image after thinning
	Image after slant correction

Some preliminary results on feature detection

	long horizontal feature at : i = 18 j = 12 length=23 short horizontal feature at : i = 0 j = 15 length=11 long vertical feature : j =18 i =22 length=29 short vertical feature at : i =21 j =1 length=27	character 0 : A /
	long vertical feature : j =16 i =0 length=32 long vertical feature : j =18 i =21 length=27 rt dia feature at : i =9 j =6 height =17 width =11 left dia feature at : i =8 j =15 height=18 width=9	character 0 : IVI / M /
	long vertical feature : j =17 i =24 length=35 long vertical feature : j =18 i =0 length=33 rt dia feature at : i =17 j =11 height =27 width =22	character 0 : IV / N /
	rt dia feature at : i =17 j =8 height =31 width =16 left dia feature at : i =16 j =23 height=33 width=17	character 0 : V /
	long horizontal feature at : i = 26 j = 15 length=32 long horizontal feature at : i = 0 j = 15 length=30 left dia feature at : i =13 j =15 height=24 width=28	character 0 : Z /

	Features Horizontal - 9 Vertical - 10 Left Slant - 0 Right Slant - 0	character 0 : II / CI / II / A / character 1 : L / character 2 : L / character 3 : III / CII / III / AI / DI / OI / character 4 : LI / H / character 5 : _ /
	Features Horizontal - 12 Vertical - 7 Left Slant - 0 Right Slant - 1	character 0 : IV / N / character 1 : E / character 2 : E / character 3 : L / character 4 : II / CI / II / A / character 5 : J /
	Features Horizontal - 6 Vertical - 6 Left Slant - 0 Right Slant - 0	character 0 : LI / U / character 1 : T / character 2 : T / character 3 : II / A / character 4 : L /
	Features Horizontal - 5 Vertical - 6 Left Slant - 1 Right Slant - 0	character 0 : A / character 1 : LT / character 2 : I / character 3 : T /
	Features Horizontal - 6 Vertical - 7 Left Slant - 0 Right Slant - 0	character 0 : IJI / A / character 1 : I / character 2 : ICI / II / A /

		character 0 : A / character 1 : T / character 2 : LI / character 3 : LI / U /
		character 0 : FI / character 1 : L / character 2 : L / character 3 : FII / AI / character 4 : LI / H / character 5 : _ /
		character 0 : IV / N / character 1 : E / character 2 : E / character 3 : L / character 4 : FI / A / character 5 : J /
		character 0 : LI / U / character 1 : T / character 2 : T / character 3 : FI / A / character 4 : L /
		character 0 : A / character 1 : TT / character 2 : I / character 3 : T /
		character 0 : _ / character 1 : I / character 2 : IFI / A /
		character 0 : FI / A / character 1 : LI / U / character 2 : V / character 3 : L / character 4 : I / character 5 : Z / character 6 : I /

7. Conclusion
In this project, we have tried to recognize many address blocks written in capital letters. Capital letters, in general, are written properly separated from each other. The strength of algorithms that we have used is that it is even able to recognize blocks in which the letters are slightly joined. This is possible due to the use of ingenious Continuous Matching algorithm and intelligently removing noise and thinning the image.

An extension to this project could be to use weights and spell checking in Continuous Matching in step 9. The algorithm of continuous matching with weights is described below.

Continuous Matching (with weights): The handwritten word from the document is broken up into the predefined features. A continuous process of determining the relative positions of these features and comparing them with the database of feature-graphs goes on until a match is obtained.A weight is attached to the character depending on the extent to which it matches the set of features. It should be noted here that the past history is not completely erased in the continuous process thus obtaining more than one possible match for the word. In this way, mistakes like d being mistaken for a c followed by an l can be avoided.
The weight of each word is the sum of the weights of its characters. These words are spell-checked and the correct one with the highest weight is chosen.

8. On-Line Links

http://www.cenparmi.concordia.ca/CENPARMI/fr5.html: This link has a lot of recent information regarding handwriting recognition ranging from basic issues like the effect of word length in handwriting recognition to specific needs like bank cheques.
http://www.cedar.buffalo.edu/Publications/bib1999.html: This link has some useful papers on handwritten phrase recognition. Some good papers are also on bib1998.html and bib1997.html.

9. Annotated Bibliography

@InProceedings{Nicchiotti/Scagliola:1999,
author =      { Nicchiotti, G. and Scagliola, C. },
title =       { Generalized Projections : a Tool for Cursive Handwriting Normalization },
booktitle =   { International Conference on Document Analysis and Recognition },
year =        { 1999 },
keywords =    { CURSIVE HANDWRITING NORMALIZATION },
institution = { Elsag S.p.A. and Parco Scientifico e Technologico dell'Elba },
month =       { August },
pages =       { 729-732 },
annote =      {
    The paper discusses techniques for cursive handwriting normalization through slant correction and baseline & upperline skew correction. They accomplish this task mathematically with the help of the method of Generalized Projections.
}}

@Article{Spitz:1998,
author =     { Spitz, L. },
title =     { Shape-based word recognition},
journal =    { International Journal on Document Analysis and Recognition },
year =       { 1998},
keywords =   { WORD RECOGNITION, SHAPE CODES },
institution ={ Document Recognition Technologies Inc. },
month =      { October },
pages =      { 178 -190 },
annote =     { This paper uses an interesting method to recognize typed documents by recognizing the shape of individual words. The characters are classified as x (those with no points above the middle line) and non-x type. The document can contain words only from a limited dictionary. Hence, once the word shape is built up from character shapes, the word is identified by template matching with all the possible words in the dictionary. For example, a word of the shape AxAA (where A denotes non-x and x denotes x type) could only match words like Bell, Both, Bull etc.
}}

@InProceedings{Vaidya:1999,
author =    { Vaidya, V. and Deshpande, G. and Shirole,B.},
title =     { Statistical Approach to Feature Extraction for Numeral Recognition from Degraded Documents},
booktitle = { International Conference on Document Analysis and Recognition}
year =      { 1999},
keywords = { FEATURE EXTRACTION, NUMERAL RECOGNITION},
institution ={ Siemens Information Systems Limited },
month =     { August },
pages =     { 273-276 },
annote =    {This paper talks about numeral recognition which is a very important issue for reading pin codes and bank cheques. It divides the numeral to be recognized into features like loops, horizontal lines etc. and then compares them with the standards stored in the database. It assigns some weights to the numerals depending on the visibility of the feature and matching between the features of the handwritten numeral and the standard numeral in the database.
}}

@Article{Datta/Parui:1994
author =    { Datta, A. and Parui, S. K. },
title =     { A Robust Parallel Thinning Algorithm for Binary Images },
booktitle = { Pattern Recognition },
Volume    = { 27 }
Number    = { 9 }
year =      { 1994 },
keywords = { BINARY OBJECT, ITERATIVE THINNING, SKELETON, MEDIAL AXIS, ROBUSTNESS },
institution ={ Indian Statistical Institute },
month =     { March },
pages =     { 1181-1192 },
annote =    { The aim of this paper is to determine the skeleton of a binary image. For this purpose, the unnecessary black pixels are removed from the image iteratively. This is done by considering the black pixels one by one and removing them if (a) It has a black and a white pixel neighbour along a given straight line, (b) Its removal leaves the connectivity of the black pixels unchanged, and (c) It is not the last black pixel in a linear chain of black pixels.

This paper presents some excellent results. For example, the skeleton of a triangle comes out to be an inverted T. Since the algorithm preserves connectivity and does not remove end-pixels, it is rpbust and is able to provide good results for many arbitrary shapes.
}}

@InProceedings{Ding:1999
author =    { Ding, Y. and Kimura, F. and Miyake, Y. and Shridhar, M. },
title =     { Evaluation and Improvement of Slant Estimation for Handwritten Words },
booktitle = { International Conference on Document Analysis and Recognition },
year =      { 1999 },
keywords = { SLANT ESTIMATION, HANDWRITING },
institution ={ Mie University and University of Michigan-Dearbon },
month =     { August },
pages =     { 753-756 },
annote =    { This paper provides a simple iterative algorithm for estimating the slant of a word. They are able to reach some very good results. }
}

@InProceedings{Madhavan:1999
author =    { Madhavan, s. and Kim, G. and Govindraju, V.},
title =     { Chaincode contour Processing for Handwritten Word Recognition },
booktitle = {IEEE Transactions on Pattern Analysis and Machine Intelligence },
year = { 1999 },
Volume = { 21 },
Number = { 9 },
keywords = { HANDWRITING RECOGNITION, WORD RECOGNITION, CONTOUR PROCESSING, CHAINCODE },
institution ={ IEEE Computer Society },
month =     { September },
pages =     { 928-932 },
annote =    { According to this paper, the first step to word recognition is edge detection after which a counter-clockwise scan of the contour thus generated leads to identification of the characters. Using this counter-clockwise scan, the authors try to determine the features in the word.
}}

@InProceedings{Morita:1999
author =    { Morita, M. and Facon, J. and Bortolozzi, F.},
title =     { Mathematical Morphology and Weighted Least Squares to Correct Handwriting Baseline Skew },
booktitle = { International Conference on Document Analysis and Recognition }
year =      { 1999},
keywords = { BASELINE SKEW, MORPHOLOGY },
institution ={ PUCPR and UFPR and ETs },
month =     { August },
pages =     { 430-433 },
annote =    { This paper talks of an approach to correct baseline skew specifically tailored to bank cheques. Correcting baseline skew is an important step in preprocessing to reduce the variation in handwritten texts. It uses a statistical approach with weighted least squares method for this purpose.
}}

@InProceedings{Lee/Gomes:1999
author =    { Lee, L. and Gomes, N.},
title =     { Automatic Classification of Deformed Handwritten Numeral Characters },
booktitle = { International Conference on Document Analysis and Recognition }
year =      { 1999 },
keywords = { HANDWRITTEN, NUMERAL },
institution ={ State Universitry of Campinas, Brazil },
month =     { August },
pages =     { 269-272 },
annote =    { This paper talks of classifying handwritten numerals based on features. If classification is not possible in first stage, neural networks are applied. They claim that their method gives correct output in 92.4% cases. Such softwares are very useful in reading bank cheques.
}}

10. Library Documentation
The project is made in C using the libpbm and libpgm libraries. The following useful functions are available in this project:

void smooth (gray **in_array)

Parameter : The input image.
Side-effect : The noise in in_array is removed.

void thin (gray **in_array)

Parameter : The input image.
Side-effect : The image is thinned to 1-pixel width.

float estimate_slant (gray **in_array)

Parameter : The thinned image.
Returns : The slant of the image.

void correct_slant (gray **in_array, float slant)

Parameters : The input image and its slant.
Side-effect : Corrects the slant of input image by shear transformation.

void correct_slant_and_thin(gray **in_array)

Parameter : The input image.
Side-effect : The image is thinned and slant-corrected.

void find_features(gray **image)

Parameter : The input image
Side-effect : The features are detected and inserted into a global list of features.

int list_match(struct list_of_feature *lof, int ht, int wd, struct list_str *answer)

Parameters : A list of features, average ht of these features, avg width of these features, a list of string structure to store the answer.
Returns 1 if the list matches against one or more characters of the global database, 0 otherwise.
Side-effect : answer is modified to store all the strings that could be possible for this list of features.

Acknowledgement:
We would like to acknowledge that the images used in this report were provided by JEE office of IIT, Kanpur.

This report was prepared by Asha Tarachandani and Pooja Nath as a part of the project component in the Course on Artificial Intelligence in Engineering in the JAN semester of 2000 . (Instructor : Amitabha Mukerjee )

[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]

Contents