Artificial Intelligence ME 768
FINAL PROJECT REPORT
PCA based low-resolution
face reconstruction
|
Ashwini Damle(9911106)
Vamsi Chikati(9911154)
IIT Kanpur : April 2000
Contents
Motivation
The video telecommunication applications viz. video
telephony, video conferencing, and distance learning are the direct motivation
for this project. These applications involve lengthy and real time video
sequences. Their transmission over the network and storage demand much
of the bandwidth of the medium. Typically, the compressed video stream
with 20-25 frames/second exceeds the 64 Kbps capacity of medium. There
are transform-based video coding standards viz. H.261 to achieve low bitrate
real-time video coding, However in model-based methods, yet satisfying
compression ratio is to be achieved and still there is tradeoff between
bit-rate and quality of reconstructed video sequence.
The distinguishing feature of the above mentioned
telecommunication-applications is that the image sequence contains the
human face of mostly one and the same person with slight changes in scale,
lighting conditions and/or orientation around vertical axis etc. In model-based
approach, this knowledge can be effectively used in the codec (coding-decoding)
of such video stream such that the transmission and storage will be much
more efficient than that achieved by using existing transform-based standards
viz. H.261.
One such technique is 'Principal
components analysis' which basically aims at 'dimensionality reduction'
of the image space provided the image contains pixels which are highly
correlated.
The present work is the step towards using the PCA
(Principal Components analysis) for video coding for the applications viz.
distance learning which basically deal with human face. The experiments
in the project aim at testing PCA for its suitability as a video-coding
technique and setting at least a primary benchmark for tuning PCA to get
"very low bitrate+quality" video coding. Obvious experiments include
comparing the reconstructed video stream with original video sequence on
the basis of similarity in gaze, orientation of head, expressions on
face to say few, testing for the synchronization between audio and video,
obtaining the value of compression ratio, bitrate requirement for encoded
video stream for various spatial and temporal resolutions.
These experiments are going to be a part of the
thesis component of Ashwini Damle,
one of the project-group participants.
Sample Input/Expected Output :
VAMSI:Original image and reconstructed image with accuracy indicator
[Back to top]
Relation to past work
Past work published in the paper titled "Linear Combination
of face views for Low Bit Rate Face Video Compression" by Ioannis Koufakis,
and Bernard Buxton. (Download paper) . Authors
proposed to represent each new face as a linear combination of the three
basis faces views of the same person. Changes in the eyes and mouth regions
are encoded using PCA. Using control points they compute the 14 coefficients
of linear combination. This information is enough to transmit and reconstruct
the face image. The limitation of this work is that combining the
results for face and eyes & mouth is causing distortion in the reconstrued
face and also shows incorrect 'expressions' on the face.
In our work, PCA analyses the face image holistically
and not partially. As we are constraining the complete image size as 64X64
our results are satisfying with low-bitrate constraint.
The reason we opted for PCA is the 'dimensionality
reduction' that the PCA can achieve. This is tested and proved in
the work of Sami Romdhani
titled "Face Recognition Using Principal Components Analysis". The main
aim of his work is to prove that PCA is the best method to reduce the work
space and for the face description face space is the optimal space.
Image space and face cluster
Because our main aim is to reduce the transmission overhead, we concentrate
on the dimensionality reduction and one of our natural choices was PCA.
Our work is related upto reconstruction only, whereas his work is to test
PCA for face recognition application also.
For the face reconstruction, Kenneth B.Russel has
successfully used Eigen spaces to recover the 3D structure and visual appearence
of the human head in real time. As our basic aim is to reconstruct using
less data we also used eigen vectors to encode and their dot product to
reconstruct.
[Back to top]
Methodology
Hardware Used:
Video Camera
Card for getting analog video input signal
PC with single monitor
Software Used:
MIL(Matrox Imaging Library) : image grabbing and displaying software
Principal Components Analysis (PCA)
software developed by the author of the referenced paper.
OS: Windows (NT/98)
Implementation
Setup:
For face images, we make the background black and
directly give the grabbed face images to the PCA module so there is no
overhead of finding the face region in the images.
Execution:
We execute our program in two phases. One is training
phase and the second is reconstruction/testing phase.
Training phase:
We grab the video stream of 200 images of ASHWIMI
with 10 frames/sec with all possible variations in poses, expressions etc.
The training video sequence is trainclip.gif (run
the file using:xanim trainclip.gif on Linux)
This grabbing is done with the help of MIL software. We give these
200 images to PCA to get 200 eigenvectors
which will be used as face model for ASHWINI for reconstruction in
the second phase. In fact, experiments' results suggest that only first
50 eigenvectors are sufficient for quality work. So we use only first 50
eigenfaces in reconstruction/testing phase.
Here first 50 eigenfaces are shown alongwith the corresponding eigenvalue.
Eigenfaces are obtained by mapping the eigenvectors to image space.
All the eigenvectors are ordered in the decreasing order of their corresponding
eigenvalues.The zeroth eigenface (here first in order) is the average face/mean
face. Every other eigenface represents variation from this average face
and is viewed as a feature.
Two possible ways of implementing reconstruction
phase were either use one PC as server as well as client or use two machines
connected by the network. In later way, we transmit a copy of first 50
eigenvectors calculated on server PC to the client PC. It will serve
as the model in reconstruction/testing phase.
We chose to implement in former way i.e. used single
PC which was sufficient for our purpose.
Reconstruction/testing Phase:
We take the video of ASHWINI while speaking by grabbing
10 frames/sec. with MIL grabber. For each grabbed image, PCA module calculates
50 weights (integer values), each weight is of one of the 50 eigenvectors
representing the model. These 50 weights are used by PCA to reconstruct
the face in image space using the face-model obtained in training phase
itself. In our implementation, on the same PC, the reconstruction is done.
Display:
With help of MIL software we allocate one display
buffer for the whole program and divide it into two child buffers, one
is for the original grabbed faces and the second is for the reconstructed
faces.
Block diagram showing network implementation:
Training phase :
Reconstruction/testing phase:
(Similar sort of implementation is done in Virtual
Dancer project in ME768 by Soumyadeep Paul.)
Experiment-Results to get the values of parameters viz.
size of training set, number of usable eigenvectors, possible frame rate
etc.
Training set used for experiments consists of 30 images of VAMSI.
The pairs of original image-reconstructed image with accuracy indicator
Here all faces are known (i.e. training set = testing set), but only
20 out of total 30 eigenvectors are used for reconstrution. However, all
reconstructed faces show considerable similarity with original faces in
face expressions.
Now we use only 10 eigenvectors for reconstruction.
This gives some blurred images after reconstruction. Also, the face
expressions and eyes,mouth movements are not matching with those of original
faces in some images. This implies that 10 eigenvectors are not sufficient
for faithful reconstruction.
Now we take 10 unknown faces and use 20 eigenvectors for reconstruction.
The results are just horrible !!
Neither reconstructed faces are showing eyes and mouth clearly nor
the facial expressions match with that of the original images.
It means that the training set of 20 face images is not sufficient to
deal with unknown faces.
Thus from experiments with still images, we got the following conclusion
-
At least 30 eigenvectors are needed for satisfactory reconstruction of
still images.
-
The large training set with varying poses orientation about vertical, horizontal
axes, and with different lighting conditions are needed. We estimate atleast
200 would do well.
[Back to top]
Final results
-
We grabbed the testing set with 10 frames/sec for 20 seconds of ASHWINI
and give it to PCA and got reconstructed images one for each test face
and made the video sequence of original test images and reconstructed images,
with 10 frames/sec. See result1
-
We grabbed the testing set of ASHWINI wearing spectacles with different
temporal resolution of 5 frames/sec, gave that set of images to PCA module
and got the reconstructed images and made video sequence of original test
images and reconstructed images. See result2
-
We grabbed the testing set of ASHWINI in different lighting conditions
i.e. with a bright light with 10 frames/sec. See results
Few of the reconstructed
faces in the above linked video clips.
From the first test case
i.e., the same lighting and without wearing glasses:
The reconstruction is reasonably well.
From the test sequence with different lighting conditions:
The reconstruction is not at all good for different lighting conditions,
because our training set does not cover different lighting conditions.
From the test sequence with wearing glasses where as the training sequence
is without spectacles:
For the video using spectacles, the reconstruction is poor, it means
the model created by the training set is not able to generalize the trained
human face.
Also,
The bitrate that is achieved in this work is 8 kbps for temporal resolution
of 10 fps with 64x64 size images and 4 kbps for 5 kbps, in network implementation.
Thus, the compression ratio is equal to (64x64x8) bits per image/(50x16)
bits per image i.e. approx. 41:1
[Back to top]
Conclusion
Limitations of this work yet to be removed
-
Poor reconstruction for nonuniform background.
-
As the background is given equal importance as face, reconstructed image
is blurred.
-
The training set should be large enough to create the generalized model
of face of a particular human. We think the training set must include around
1000 faces with different lighting conditions etc.
-
As the time required to calculate the eigenvectors is directly proportional
to the size of training set, large amount of offline computation time is
required for matrix manipulation. This time can be reduced by using efficient
matrix manipulation algorithms.
Possible Extensions
-
To increase the scope of movement in the vertical plane for the speaker
we can use face centering software which will find out the face area in
the frame and applying PCA to the relevant parts only.
-
Using snake-algorithms (e.g. ziplock) we can reduce the influence of the
background.
-
For further reduction of the dimension we can find out the principal components
to the specific parts of the face only and sending the texture information
to the destination apriori we can reconstruct the face with these few number
of principal components satisfactorily.
-
To reduce the reconstruction error over time, the receiver side can be
made to adjust the eigenvectors according to the distance vector for new/unknown
faces it comes across for reconstruction.
-
Further lowering of the bitrate can be achieved if the model-based approach
is coupled with the motion-based compression techniques like motion-estimation
to consider interframe difference and send that difference in weights
instead of weights themselves.
[Back to top]
Bibliography/WebInfo
[Back to top]
Library documentation
The source code is available in ~vamsi/768/www/project/documentation
.
Click for readme
[Back to top]
This report was prepared by Ashwini Damle and Vamsi Chikati as a part
of the project component in the Course on Artificial Intelligence in Engineering
in the JAN-APRIL semester of 2000. (Instructor : Amitabha
Mukerjee )
[ COURSE WEB
PAGE ] [ COURSE
PROJECTS 2000 (local CC users) ]