Project Proposal by [INSERT Your Name Here]

Artificial Intelligence ME 768
FINAL PROJECT REPORT

PCA based low-resolution face reconstruction

Ashwini Damle(9911106)
Vamsi Chikati(9911154)

IIT Kanpur : April 2000

Motivation
Relation to the Past Work
Methodology
Experiments' Results (still images of VAMSI)
Final results (video sequences of ASHWINI)
Conclusion
Bibliography/WebInfo
Library Documentation

Motivation

    The video telecommunication applications viz. video telephony, video conferencing, and distance learning are the direct motivation for this project. These applications involve lengthy and real time video sequences. Their transmission over the network and storage demand much of the bandwidth of the medium. Typically, the compressed video stream with 20-25 frames/second exceeds the 64 Kbps capacity of medium. There are transform-based video coding standards viz. H.261 to achieve low bitrate real-time video coding, However in model-based methods, yet satisfying compression ratio is to be achieved and still there is tradeoff between bit-rate and quality of reconstructed video sequence.
    The distinguishing feature of the above mentioned telecommunication-applications is that the image sequence contains the human face of mostly one and the same person with slight changes in scale, lighting conditions and/or orientation around vertical axis etc. In model-based approach, this knowledge can be effectively used in the codec (coding-decoding) of such video stream such that the transmission and storage will be much more efficient than that achieved by using existing transform-based standards viz. H.261.
    One such technique is 'Principal components analysis' which basically aims at 'dimensionality reduction' of the image space provided the image contains pixels which are highly correlated.
    The present work is the step towards using the PCA (Principal Components analysis) for video coding for the applications viz. distance learning which basically deal with human face. The experiments in the project aim at testing PCA for its suitability as a video-coding technique and setting at least a primary benchmark for tuning PCA to get "very low bitrate+quality" video coding. Obvious experiments include comparing the reconstructed video stream with original video sequence on the basis of similarity in gaze, orientation of head, expressions on   face to say few, testing for the synchronization between audio and video, obtaining the value of compression ratio, bitrate requirement for encoded video stream for various spatial and temporal resolutions.
    These experiments are going to be a part of the thesis component of Ashwini Damle, one of the project-group participants.

Sample Input/Expected Output :

VAMSI:Original image and reconstructed image with accuracy indicator

[Back to top]

Relation to past work

    Past work published in the paper titled "Linear Combination of face views for Low Bit Rate Face Video Compression" by Ioannis Koufakis, and Bernard Buxton. (Download paper) . Authors proposed to represent each new face as a linear combination of the three basis faces views of the same person. Changes in the eyes and mouth regions are encoded using PCA. Using control points they compute the 14 coefficients of linear combination. This information is enough to transmit and reconstruct the face image. The limitation of this work is that combining the results for face and eyes & mouth is causing distortion in the reconstrued face and also shows incorrect 'expressions' on the face.
    In our work, PCA analyses the face image holistically and not partially. As we are constraining the complete image size as 64X64 our results are satisfying with low-bitrate constraint.
    The reason we opted for PCA is the 'dimensionality reduction' that the PCA can achieve. This is tested and proved in the work of Sami Romdhani titled "Face Recognition Using Principal Components Analysis". The main aim of his work is to prove that PCA is the best method to reduce the work space and for the face description face space is the optimal space. Image space and face cluster

Because our main aim is to reduce the transmission overhead, we concentrate on the dimensionality reduction and one of our natural choices was PCA. Our work is related upto reconstruction only, whereas his work is to test PCA for face recognition application also.
For the face reconstruction, Kenneth B.Russel has successfully used Eigen spaces to recover the 3D structure and visual appearence of the human head in real time. As our basic aim is to reconstruct using less data we also used eigen vectors to encode and their dot product to reconstruct.

[Back to top]

Methodology

Hardware Used:
                    Video Camera
                    Card for getting analog video input signal
                    PC with single monitor
Software Used:
                    MIL(Matrox Imaging Library) : image grabbing and displaying software
                    Principal Components Analysis (PCA) software developed by the author of the referenced paper.
                    OS: Windows (NT/98)

Implementation

Setup:
    For face images, we make the background black and directly give the grabbed face images to the PCA module so there is no overhead of finding the face region in the images.
Execution:
    We execute our program in two phases. One is training phase and the second is reconstruction/testing phase.
Training phase:
    We grab the video stream of 200 images of ASHWIMI with 10 frames/sec with all possible variations in poses, expressions etc. The training video sequence is trainclip.gif (run the file using:xanim trainclip.gif on Linux)
This grabbing is done with the help of MIL software. We give these 200 images to PCA to get 200 eigenvectors
which will be used as face model for ASHWINI for reconstruction in the second phase. In fact, experiments' results suggest that only first 50 eigenvectors are sufficient for quality work. So we use only first 50 eigenfaces in reconstruction/testing phase.
Here first 50 eigenfaces are shown alongwith the corresponding eigenvalue.

Eigenfaces are obtained by mapping the eigenvectors to image space. All the eigenvectors are ordered in the decreasing order of their corresponding eigenvalues.The zeroth eigenface (here first in order) is the average face/mean face. Every other eigenface represents variation from this average face and is viewed as a feature.

    Two possible ways of implementing reconstruction phase were either use one PC as server as well as client or use two machines connected by the network. In later way, we transmit a copy of first 50 eigenvectors calculated on server PC to the client PC. It will serve as the model in reconstruction/testing phase.
    We chose to implement in former way i.e. used single PC which was sufficient for our purpose.
Reconstruction/testing Phase:
    We take the video of ASHWINI while speaking by grabbing 10 frames/sec. with MIL grabber. For each grabbed image, PCA module calculates 50 weights (integer values), each weight is of one of the 50 eigenvectors representing the model. These 50 weights are used by PCA to reconstruct the face in image space using the face-model obtained in training phase itself. In our implementation, on the same PC, the reconstruction is done.
Display:
    With help of MIL software we allocate one display buffer for the whole program and divide it into two child buffers, one is for the original grabbed faces and the second is for the reconstructed faces.
Block diagram showing network implementation:
Training phase :

Reconstruction/testing phase:

(Similar sort of implementation is done in Virtual Dancer project in ME768 by Soumyadeep Paul.)

Experiment-Results to get the values of parameters viz. size of training set, number of usable eigenvectors, possible frame rate etc.

Training set used for experiments consists of 30 images of VAMSI.

The pairs of original image-reconstructed image with accuracy indicator

Here all faces are known (i.e. training set = testing set), but only 20 out of total 30 eigenvectors are used for reconstrution. However, all reconstructed faces show considerable similarity with original faces in face expressions.

Now we use only 10 eigenvectors for reconstruction.

This gives some blurred images after reconstruction. Also, the face expressions and eyes,mouth movements are not matching with those of original faces in some images. This implies that 10 eigenvectors are not sufficient for faithful reconstruction.

Now we take 10 unknown faces and use 20 eigenvectors for reconstruction.

The results are just horrible !!
Neither reconstructed faces are showing eyes and mouth clearly nor the facial expressions match with that of the original images.

It means that the training set of 20 face images is not sufficient to deal with unknown faces.
Thus from experiments with still images, we got the following conclusion

At least 30 eigenvectors are needed for satisfactory reconstruction of still images.
The large training set with varying poses orientation about vertical, horizontal axes, and with different lighting conditions are needed. We estimate atleast 200 would do well.

[Back to top]

Final results

We grabbed the testing set with 10 frames/sec for 20 seconds of ASHWINI and give it to PCA and got reconstructed images one for each test face and made the video sequence of original test images and reconstructed images, with 10 frames/sec. See result1
We grabbed the testing set of ASHWINI wearing spectacles with different temporal resolution of 5 frames/sec, gave that set of images to PCA module and got the reconstructed images and made video sequence of original test images and reconstructed images. See result2
We grabbed the testing set of ASHWINI in different lighting conditions i.e. with a bright light with 10 frames/sec. See results

Few of the reconstructed faces in the above linked video clips.
From the first test case i.e., the same lighting and without wearing glasses:

The reconstruction is reasonably well.

From the test sequence with different lighting conditions:

The reconstruction is not at all good for different lighting conditions, because our training set does not cover different lighting conditions.

From the test sequence with wearing glasses where as the training sequence is without spectacles:

For the video using spectacles, the reconstruction is poor, it means the model created by the training set is not able to generalize the trained human face.

Also,
The bitrate that is achieved in this work is 8 kbps for temporal resolution of 10 fps with 64x64 size images and 4 kbps for 5 kbps, in network implementation.
Thus, the compression ratio is equal to (64x64x8) bits per image/(50x16) bits per image i.e. approx. 41:1

[Back to top]

Conclusion

Limitations of this work yet to be removed

Poor reconstruction for nonuniform background.
As the background is given equal importance as face, reconstructed image is blurred.
The training set should be large enough to create the generalized model of face of a particular human. We think the training set must include around 1000 faces with different lighting conditions etc.
As the time required to calculate the eigenvectors is directly proportional to the size of training set, large amount of offline computation time is required for matrix manipulation. This time can be reduced by using efficient matrix manipulation algorithms.

Possible Extensions

To increase the scope of movement in the vertical plane for the speaker we can use face centering software which will find out the face area in the frame and applying PCA to the relevant parts only.
Using snake-algorithms (e.g. ziplock) we can reduce the influence of the background.
For further reduction of the dimension we can find out the principal components to the specific parts of the face only and sending the texture information to the destination apriori we can reconstruct the face with these few number of principal components satisfactorily.
To reduce the reconstruction error over time, the receiver side can be made to adjust the eigenvectors according to the distance vector for new/unknown faces it comes across for reconstruction.
Further lowering of the bitrate can be achieved if the model-based approach is coupled with the motion-based compression techniques like motion-estimation to consider interframe difference and send that difference in weights instead of weights themselves.

[Back to top]

Bibliography/WebInfo

[Back to top]

Library documentation

The source code is available in ~vamsi/768/www/project/documentation .
Click for readme

[Back to top]

This report was prepared by Ashwini Damle and Vamsi Chikati as a part of the project component in the Course on Artificial Intelligence in Engineering in the JAN-APRIL semester of 2000. (Instructor : Amitabha Mukerjee )

[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]

Contents

Motivation

Relation to past work

Methodology

Implementation

Experiment-Results to get the values of parameters viz. size of training set, number of usable eigenvectors, possible frame rate etc.

Final results

Conclusion

Bibliography/WebInfo

Library documentation