Artificial Intelligence CS365 - Homework 1

A : k-NN based classifier

Matlab code : File q1.m
Matlab code : File loadDigits.m
Remarks : Training Set 3000, Test Set 1000
Error percentage vs K Value
Observations about data set based on the graph :
Maximum Accuracy is obtained at k=4 only 101 wrong samples out of 1000

B : Manifold based modeling of MNIST digits

Isomap Brief :

Isomap solves the problem of Dimensionality redution. In isomap geodesic distances are incorporated on a weighted graph with metric MDS(Multi-Dimensional Scaling). This is done to incorporate manifold structure in the resulting embedding.
Geodesic Distance accoring to Isomap : sum of edge weights along the shortest path between two nodes.
The particular Isomap algorithm used here learns the global geometry of dataset using local metric information.

Using Euclidean Distance

a. 1 and 7

Matlab code : File q21.m
Matlab code : File loadDigits17.m
Matlab code : File L2_distance.m
Matlab code : File Isomap17.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Residual Variance vs Isomap Dimensionality Curve
Observations :
clusters of 1 and 7 are separate and can be identified easily. Hence for 1 and 7 this algorithm works very effectively.

b. 4 and 9

Matlab code : File q22.m
Matlab code : File loadDigits29.m
Matlab code : File L2_distance.m
Matlab code : File Isomap29.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Residual Variance vs Isomap Dimensionality Curve
Observations :
clusters of 4 and 9 are sort of mixed up in middle, hence can not be identified easily. Hence for 4 and 9 this algorithm is not very efficient.

c. All the digits

Matlab code : File q23.m
Matlab code : File loadDigits.m
Matlab code : File L2_distance.m
Matlab code : File Isomapad.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Residual Variance vs Isomap Dimensionality Curve

Extra Credit :

Shown in part (a),(b) and (c) above.

Using Tangent Distance

a. 1 and 7

Matlab code : File q21t.m
Matlab code : File loadDigits17.m
Matlab code : File tangent_d.m
Matlab code : File Isomap17.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Observations :
As from Graph, 1 and 7 can easily be separated. Hence this algorithm also works efficiently for 1 and 7.

b. 4 and 9

Matlab code : File q22t.m
Matlab code : File loadDigits29.m
Matlab code : File tangent_d.m
Matlab code : File Isomap29.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Observations :
4 and 9 are distinguishable more comfortably as compared to technique of Euclidean distance. Hence tangent distance works much better for 4 and 9.

c. All the digits

Matlab code : File q23t.m
Matlab code : File loadDigits.m
Matlab code : File tangent_d.m
Matlab code : File Isomapad.m

Graphs :
Two dimesional Isomap Embedding(with neighbourhood graph
Observations :
Clusters of different digits are comparative more distinguishable to those of Euclidean ones.

C. Deep Learning

Procedure of Experiment i.e. Working of Deep Architectures

DBNs are graphical models which learn to extract a deep hierarchial representation of training data.
Process is as follows : Put the initial input to train the first layer.Now, through output of first layer we can obtain input representation of 2nd layer. Now train the second layer and repeat the process for all layers. After fine tuning, result is obtained.
Here DBN sizes is architechture of network. Observations while tweaking:
1). Increasing Learning rate upto 5-6 decreases error.
2). Increasing numepochs in DBN decreases error but increases time taken by code to run i.e. complexity
3). batchsize of dbn = 100 is optimal value. Decreasing/Increasing its value increases error.
4).Increasign alpha increases error.
5).Increasing numepochs in NN reduces error but increases complexity.
6). Decreasing Batchsize of NN, reduces error.
7). making Neural Network Architecture More complex i.e. increasign no. of layers dont have much impact on error rate. Increasign no. of neurons in one layer do impact significant effect.

Table

DBN Sizes DBN Numepochs DBN batchsize DBN Momentum DBN Alpha NN learning rate NN numepochs NN batchsize ERROR percent
[100 100] 1 100 0 1 1 1 100 8.91
[100 100] 1 100 0 1 2 1 100 7.56
[100 100] 1 100 0 1 3 1 100 6.94
[100 100] 1 100 0 1 4 1 100 6.52
[100 100] 1 100 0 1 5 1 100 6.13
[100 100] 1 100 0 1 3 2 100 5.72
[100 100] 1 100 0 1 3 4 100 4.48
[100 100] 1 100 0 1 3 6 100 4.13
[100 100] 1 100 0 1 3 10 100 3.69
[100 100] 1 100 0 1 3 10 50 3.35
[100 100] 1 100 0 1 3 15 50 3.02
[100 100] 5 100 0 1 3 15 50 2.81
[100 100] 15 100 0 1 3 15 50 2.61
[100 100] 5 50 0 1 3 15 50 3.08
[100 100] 5 100 0 2 3 15 50 3.44
[100 100 100] 5 100 0 1 3 15 50 3.03
[200 200 200 200] 5 100 0 1 4.5 30 50 2.27
[100 100] 5 100 0 1 4.5 30 50 2.12
[100 100] 5 100 0 1 4.5 30 50 2.27
[100 100] 1 200 0 1 1 1 100 9.95
[300 300 300 300] 5 100 0 1 5 15 20 2.68
[200 200 200 200] 5 100 0 1 4.5 15 20 2.82
[300 300] 5 100 0 1 4.5 15 20 2.04
[400 400] 8 100 0 1 4.5 20 10 2.00
After Tweaking, I was able to achieve a minimum of 2% error using parameters specified in above table.