Our initial plan was to generate a reference matrix from all the frequency vectors and the use it to compare. But we got very poor results. So we switched to a slightly modified scheme. We used all the frequency vectors generated during training as our references. That means for every set of data we got 6 vectors, one for each word. We also tried to do it for a single user as during preliminary experimentations we found that the results are far better for a single user. Basically we find the inner product & pick the maximum of the 6 inner products of a set. The probability funton of the command corresponding to this reference is raised. At the end, the command with the highest probability is given as output.
Generation of reference vector as a matrix
We are using 12 sets of words, each consisting of 6 words each, from
10 speakers to generate our reference matrix[6][][]. This makes it 72 words
in all. For each file we calculate 6 inner products. For every file the
expected inner product is either +1 (perfect match ) or -1 ( mismatch ).
error=abs(expected_inner_product-inner_product)
matrixk+1=matrixk + U*error*A
where
k is the number of iterations
performed.
U is a constant ( we have
found that U=0.01 gives best results ).
A corresponds to the reference
word under consideration.
The convergence criterion is when max_error goes below 0.1% during computation.
Computation of inner product
Inner product of A & mat[i] is evaluated using
inner_product=Summationj,k A[j][k]*matrix[j][k]
Comparison with the 6 references
For comparison, the inner product with all 6 matrix[i]'s is evaluated.
The highest of these is taken as the match if this highest inner product
is greater than a threshold.
Back