Gatram Pradeep (97131)
Shalabh Gupta (97319)
IIT Kanpur : February 2000
Suppose you want to control a menu driven system. What is the most striking property that you can think of ?
Well the first thought that
came to our mind is that the range of inputs in a menu driven system is
limited. In fact, by using a menu all we are doing is limiting the input
domain space. Now, this is one characteristic which can be very useful
in implementing the menu in stand alone systems. For example think of the
pine menu or a washing machine menu. How many distinct commands do they
require.
Top
Last year we both participated in robocarromines (Techkriti '99). We were using switches to control the various motions of our robots. Then Shalabh participated in sumo fighting (Yantriki '99) using a fully autonomous version of the robot NINJA. We felt the next logical step was to either go for some sort of wireless control mechanism or design a voice based control system. We decided to go for the latter.
Also a dancing robot competition is being organized by Ingenuity cell at Techkriti-Millennium, in which the robots have to dance to the tune of the music being played. This event was the one which got us to think about the concept of a voice controlled robot.
We are not aiming to build
a software which can recognize a lot of words. Our basic idea is to develop
some sort of menu driven control for our robot, where the menu is going
to be voice driven. A recognition strength of a few words would do for
such kind of jobs. A person interacting with such a system would not need
to use his hands for routine jobs, which is what we wish to achieve. This
leads us to our main task in the project.
Top
What we are aiming at is to control
the robot Ninja using voice commands.
Ninja can do these basic
tasks :-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A lot of work has been done earlier in the field of isolated word recognition. Using a traditional recognizer an accuracy of around 60% has previously been obtained for both a 156 town name task and 1108 road name task. Techniques presented in [Azzopardi/Semnani_et_al:1998] has resulted in an accuracy of 90% for an automated corporate directory system with 120,000 entries.
As an input method for rapidly spreading small portable information devices, and advanced robotics' applications, development of speaker independent speech recognition technology which can be embedded on a single DSP chip has been developed by [Hoshimi/Yamada_et_al:1998]. When the newly proposed noise robustness method was tested with 100 isolated word vocabulary speech of 50 subjects, recognition accuracy of 94.7% was obtained under various noisy environments.
Software engineering for research and development in the area of signal processing is by no means unimportant. A programming paradigm which allows software components to be advantageously combined with each other in a way that recalls the concept of hardware plug-and-play, without the need for incorporating complex schedulers to control data flows has been developed by [Dutoit/Shroeter:1998].
Earlier similar work in a
limited input domain was done using wireless for e.g. remote control
of electrical switches (this is currently one of the ingenuity problems).
We read a newspaper report about an year ago (The Hindu : Thursday Science
& Technology Section) about such a project. A suggested application
was for hospitalized patients who usually are dependent on some one else
for to switch on/off the lights, fan, etc. But what if the patient's hands
are broken. Obviously a voice based system ought to be used in such a case.
Top
We are taking the voice data
from the microphone using a soundcard. This data
is stored in an array. This array is passed on to a function
which extracts words from the array ( i.e. spoken words
are extracted & quiet periods are dumped ). These words are the sent
to a function which extracts frequency as a function
of time. This is the frequency vector of the spoken word. This vector is
compared
with reference vectors. The comparison is done using the standard inner
product of two vectors. One of the reference vectors would match ( i.e.
the inner product in this case is greater than the other 5 ). The command
corresponding to this reference vector is fed to Ninja. The electronic
circuit mounted on Ninja would then interpret the command & move it
accordingly.
Top
CONCLUSIONS
In this project we are getting
a user dependent isolated word recognition system with a recognition accuracy
of about 85% using six words. The accuracy can be improved further and
the system can be used for more number of words if during the training
of the system, the noise conditions are improved.
We were getting a peak SNR (Signal to noise ratio) of about 20 dB, whereas in best conditions,
SNR can be obtained upto 35 dB, at 8 KHz sampling rate. Also the microphone
used by us was not filtering out the bursts of air produced when we speak
words, which was adding to a lot of noise in the input voice signal.
But for speaker independent word recognition system, we cannot use the technique
discussed here. The frequency scales, speed of speaking words, and signal
power concentration on different syllables vary widely from speaker to
speaker (as depicted by the variations in the frequency and amplitude graphs
for the same words in methodology section). Thus for speaker independent
systems, we must use a better approach like Markov chain modeling etc.
Top
APPLICATIONS
We believe such a system
would find wide variety of applications. Menu driven systems such as e-mail
readers, household appliances like washing machines, microwave ovens, and
pagers and mobiles etc. will become voice controlled in future. Our project
may find applications out there because inherently the number of possible
inputs are limited. Using our software these can be controlled through
a network as well.
This proposal was prepared by Gatram Pradeep and Shalabh Gupta as a
part of the project component in the Course on Artificial Intelligence
in Engineering in the JAN semester of 2000 .
(Instructor : Amitabha
Mukerjee )
[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]