Control of a Robot by Voice Input

Artificial Intelligence ME 768 Jan-Apr 2000

CONTROL OF A ROBOT BY VOICE INPUT

Submitted by

Gatram Pradeep (97131)
Shalabh Gupta (97319)

IIT Kanpur : February 2000

What are we doing
Our motivation behind it
The example -- Our task
Sample Input Output
Past Work
Proposed methodology
Results
Conclusions
Applications
Links on the web
Source code
Bibliography

INTRODUCTION

Suppose you want to control a menu driven system. What is the most striking property that you can think of ?

Well the first thought that came to our mind is that the range of inputs in a menu driven system is limited. In fact, by using a menu all we are doing is limiting the input domain space. Now, this is one characteristic which can be very useful in implementing the menu in stand alone systems. For example think of the pine menu or a washing machine menu. How many distinct commands do they require.
Top

MOTIVATION

Last year we both participated in robocarromines (Techkriti '99). We were using switches to control the various motions of our robots. Then Shalabh participated in sumo fighting (Yantriki '99) using a fully autonomous version of the robot NINJA. We felt the next logical step was to either go for some sort of wireless control mechanism or design a voice based control system. We decided to go for the latter.

Also a dancing robot competition is being organized by Ingenuity cell at Techkriti-Millennium, in which the robots have to dance to the tune of the music being played. This event was the one which got us to think about the concept of a voice controlled robot.

We are not aiming to build a software which can recognize a lot of words. Our basic idea is to develop some sort of menu driven control for our robot, where the menu is going to be voice driven. A recognition strength of a few words would do for such kind of jobs. A person interacting with such a system would not need to use his hands for routine jobs, which is what we wish to achieve. This leads us to our main task in the project.
Top

THE TASK

What we are aiming at is to control the robot Ninja using voice commands.
Ninja can do these basic tasks :-

move forward
move back
turn right
turn left
hit
stop ( stops doing the current job )

This can be considered as a small menu consisting of 5 commands. So a software which can recognize and distinguish the 5 commands from one another will do the job. So a software needs to be developed which takes voice data as input & outputs the matched command.
Top

SAMPLE INPUT OUTPUT

INPUT (Speaker speaks) OUTPUT (Robot does)

forward moves forward

back moves back

right turns right

left turns left

hit hits the coin

stop stops doing current task

Top

PAST WORK

A lot of work has been done earlier in the field of isolated word recognition. Using a traditional recognizer an accuracy of around 60% has previously been obtained for both a 156 town name task and 1108 road name task. Techniques presented in [Azzopardi/Semnani_et_al:1998] has resulted in an accuracy of 90% for an automated corporate directory system with 120,000 entries.

As an input method for rapidly spreading small portable information devices, and advanced robotics' applications, development of speaker independent speech recognition technology which can be embedded on a single DSP chip has been developed by [Hoshimi/Yamada_et_al:1998]. When the newly proposed noise robustness method was tested with 100 isolated word vocabulary speech of 50 subjects, recognition accuracy of 94.7% was obtained under various noisy environments.

Software engineering for research and development in the area of signal processing is by no means unimportant. A programming paradigm which allows software components to be advantageously combined with each other in a way that recalls the concept of hardware plug-and-play, without the need for incorporating complex schedulers to control data flows has been developed by [Dutoit/Shroeter:1998].

Earlier similar work in a limited input domain was done using wireless for e.g. remote control of electrical switches (this is currently one of the ingenuity problems). We read a newspaper report about an year ago (The Hindu : Thursday Science & Technology Section) about such a project. A suggested application was for hospitalized patients who usually are dependent on some one else for to switch on/off the lights, fan, etc. But what if the patient's hands are broken. Obviously a voice based system ought to be used in such a case.
Top

METHODOLOGY

We are taking the voice data from the microphone using a soundcard. This data is stored in an array. This array is passed on to a function which extracts words from the array ( i.e. spoken words are extracted & quiet periods are dumped ). These words are the sent to a function which extracts frequency as a function of time. This is the frequency vector of the spoken word. This vector is compared with reference vectors. The comparison is done using the standard inner product of two vectors. One of the reference vectors would match ( i.e. the inner product in this case is greater than the other 5 ). The command corresponding to this reference vector is fed to Ninja. The electronic circuit mounted on Ninja would then interpret the command & move it accordingly.
Top

RESULTS

Data acquisition using a microphone & soundcard has been successful.
Data acquired has been segmented into separate words & quiet periods are being dumped.
Frequency vectors of the words have been generated.
The reference matrix has been generated using data acquired from 10 speakers in all. We used 12 sets of 6 vectors. But we got very bad results. The probable reason was that the size of matrix is 60x39. So around 1000 data sets might have resulted in a better performance with the matrix. But generating such a huge amount of data was not possible for us.
We got around 85% recognition rate for a single user. The system did not work for multiple users i.e it was user dependent. The performance was best when reference files of one person only were included.

Top

CONCLUSIONS
        In this project we are getting a user dependent isolated word recognition system with a recognition accuracy of about 85% using six words. The accuracy can be improved further and the system can be used for more number of words if during the training of the system, the noise conditions are improved.
       We were getting a peak SNR (Signal to noise ratio) of about 20 dB, whereas in best conditions, SNR can be obtained upto 35 dB, at 8 KHz sampling rate. Also the microphone used by us was not filtering out the bursts of air produced when we speak words, which was adding to a lot of noise in the input voice signal.
       But for speaker independent word recognition system, we cannot use the technique discussed here. The frequency scales, speed of speaking words, and signal power concentration on different syllables vary widely from speaker to speaker (as depicted by the variations in the frequency and amplitude graphs for the same words in methodology section). Thus for speaker independent systems, we must use a better approach like Markov chain modeling etc.

Top

APPLICATIONS
We believe such a system would find wide variety of applications. Menu driven systems such as e-mail readers, household appliances like washing machines, microwave ovens, and pagers and mobiles etc. will become voice controlled in future. Our project may find applications out there because inherently the number of possible inputs are limited. Using our software these can be controlled through a network as well.

Top

WEB LINKS

Signal Representation

Isolated Word Recognition

Top

SOURCE CODE

BIBLIOGRAPHY

This proposal was prepared by Gatram Pradeep and Shalabh Gupta as a part of the project component in the Course on Artificial Intelligence in Engineering in the JAN semester of 2000 .
(Instructor : Amitabha Mukerjee )

[ COURSE WEB PAGE ] [ COURSE PROJECTS 2000 (local CC users) ]

INPUT (Speaker speaks)	OUTPUT (Robot does)
forward	moves forward
back	moves back
right	turns right
left	turns left
hit	hits the coin
stop	stops doing current task