Communication device and method for endpointing speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S253000, C704S248000, C704S246000, C704S251000

Reexamination Certificate

active

06321197

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates generally to electronic devices with speech recognition technology. More particularly, the present invention relates to portable communication devices having speaker dependent speech recognition technology.
BACKGROUND OF THE INVENTION
As the demand for smaller, more portable electronic devices grows, consumers want additional features that enhance and expand the use of portable electronic devices. These electronic devices include compact disc players, two-way radios, cellular telephones, computers, personal organizers, speech recorders, and similar devices. In particular, consumers want to input information and control the electronic device using voice communication alone. It is understood that voice communication includes speech, acoustic, and other non-contact communication. With voice input and control, a user may operate the electronic device without touching the device and may input information and control commands at a faster rate than a keypad. Moreover, voice-input-and-control devices eliminate the need for a keypad and other direct-contact input, thus permitting even smaller electronic devices.
Voice-input-and-control devices require proper operation of the underlying speech recognition technology. Basically, speech recognition technology analyzes a speech waveform within a speech data acquisition window for matching the waveform to word models stored in memory. If a match is found between the speech waveform and a word model, the speech recognition technology provides a signal to the electronic device identifying the speech waveform as the word associated with the word model.
A word model is created generally by storing parameters derived from the speech waveform of a particular word in memory. In speaker independent speech recognition devices, parameters of speech waveforms of a word spoken by a sample population of expected users are averaged in some manner to create a word model for that word. By averaging speech parameters for the same word spoken by different people, the word model should be usable by most if not all people.
In speaker dependent speech recognition devices, the user trains the device by speaking the particular word when prompted by the device. The speech recognition technology then creates a word model based on the input from the user. The speech recognition technology may prompt the user to repeat the word any number of times and then average the speech waveform parameters in some manner to create the word model.
To properly operate speech recognition technology, it is important to consistently identify the start and end endpoints of the speech utterances. Inconsistently identified endpoints may truncate words and may include extraneous noises within the speech waveform acquired by the speech recognition technology. Truncated words and/or noises may result in poorly trained models and cause the speech recognition technology not to work properly when the acquired speech waveform does not match any word model. In addition, truncated words and noises may cause the speech recognition technology to misidentify the acquired speech waveform as another word. In speaker dependent speech recognition devices, problems due to poor endpointing are aggravated when the speech recognition technology permits only a few training utterances.
The prior art describe techniques using threshold energy comparisons, zero crossings analysis, and cross correlation. These methods sequentially analyze speech features from left to right, right to left, or center outwards of the speech waveform. In these techniques, utterances containing pauses or gaps are problematic. Typically, pauses or gaps in an utterance are caused by the nature of the word, the speaking style of the user, and by utterances containing multiple words. Some techniques truncate the word or phrase at the gap, assuming erroneously that the endpoint has been reached. Other techniques use a maximum gap size criteria to combine detected parts of utterances with pauses into a single utterance. In such techniques, a pause longer than a predetermined threshold can cause parts of the utterance to be excluded.
Accordingly, there is a need to consistently identify the start and end endpoints of a complete speech utterance within a speech acquisition window. There also is a need to ensure words or parts of words separated by pauses or gaps in the utterance are completely included within the utterance boundaries.
SUMMARY OF THE INVENTION
The primary object of the present invention is to provide a communication device and method for endpointing speech utterances. Another object of the present invention is to ensure that words and parts of words separated by gaps and pauses are included in the utterance boundaries. As discussed in greater detail below, the present invention overcomes the limitations of the existing art to achieve these objects and other benefits.
The present invention provides a communication device capable of endpointing speech utterances and including words and parts of words separated by gaps and pauses in the utterance boundaries. The communication device includes a microprocessor connected to communication interface circuitry, audio circuitry, memory, an optional keypad, a display, and a vibrator/buzzer. The audio circuitry is connected to a microphone and a speaker. The audio circuitry includes filtering and amplifying circuitry and an analog-to-digital converter. The microprocessor includes a speech
oise classifier and speech recognition technology.
The microprocessor analyzes a speech signal to determine speech waveform parameters within a speech acquisition window. The microprocessor utilizes the speech waveform parameters to determine the start and end points of the speech utterance. To make this determination, the microprocessor starts at a frame index based on the energy centroid of the speech utterance and analyzes the frames preceding and following the frame index to determine the endpoints. When a potential endpoint is identified, the microprocessor compares the cumulative energy at the potential endpoint to the total energy of the speech acquisition window to determine whether additional speech frames are present. Accordingly, gaps and pauses in the utterance will not result in an erroneous endpoint determination.


REFERENCES:
patent: 4821325 (1989-04-01), Martin et al.
patent: 4945566 (1990-07-01), Mergel et al.
patent: 5023911 (1991-06-01), Gerson
patent: 5682464 (1997-10-01), Sejnoha
patent: 5829000 (1998-10-01), Huang et al.
patent: 5884258 (1999-03-01), Rozak et al.
patent: 5899976 (1999-05-01), Rozak
patent: 6003004 (1999-12-01), Hershkovits et al.
patent: 6029130 (2000-02-01), Ariyoshi
patent: 6134524 (2000-10-01), Peters et al.
patent: 6216103 (2001-04-01), Wu et al.
Qiang et al, “On Prefiltering and Endpoint Detection of Speech Signal”, Proceedings of ICSP 1998, pp749-752.*
Zhang et al,“A Robust and Fast Endpoint Detection Algorithm for Isolated Word Recognition”, 1997 IEEE ICIPS, pp1819-1822.*
Taboada et al,“Explicit Estimation of Speech Boundaries”, IEE 1994.*
Dermates, “Fast Endpoint Detection Algorithm for Isolated Word Recognition in Office Environment”, 1991, IEEE, pp 733-736.*
Ying et al,“Endpoint Detection of Isolated Utterances based on a Modified Teager Energy Measurement”, 1993 IEEE, 732-735.*
Explicit Estimation of Speech Boundaries, Jaboada et al., IEE Proc. Sci. Mens. Techno;. vol. 141, No. 3, May 1994.
Fast Endpoint Detection Algorithm for Isolated Word Recognition in Office Environment, E. Dermatas et al., CH2977-7/91/0000-0733, 1991 IEEE.
Comparison of Energy-Based Endpoint Detectors for Speech Signal Processing. A Ganapathiraju et al., 0-7803-3088-9/96 1996 IEEE.
A Robust and Fast Endpoint Detection Algorithm for Isolated Word Recognition, Y. Zhang et al., 1997 IEEE International Conference on Intelligent Processing Systems, Oct. 28-31, Beijing, China.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Communication device and method for endpointing speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Communication device and method for endpointing speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Communication device and method for endpointing speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2617796

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.