Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-01-12
2004-04-06
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S226000, C704S253000, C381S094300
Reexamination Certificate
active
06718302
ABSTRACT:
BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates generally to electronic speech recognition systems, and relates more particularly to a method for utilizing validity constraints in a speech endpoint detector.
2. Description of the Background Art
Implementing an effective and efficient method for system users to interface with electronic devices is a significant consideration of system designers and manufacturers. Human speech recognition is one promising technique that allows a system user to effectively communicate with selected electronic devices, such as digital computer systems. Speech typically consists of one or more spoken utterances which each may include a single word or a series of closely-spaced words forming a phrase or a sentence. In practice, speech recognition systems typically determine the endpoints (the beginning and ending points) of a spoken utterance to accurately identify the specific sound data intended for analysis. Conditions with significant ambient background-noise levels present additional difficulties when implementing a speech recognition system. Examples of such conditions may include speech recognition in automobiles or in certain manufacturing facilities. In such user applications, in order to accurately analyze a particular utterance, a speech recognition system may be required to selectively differentiate between a spoken utterance and the ambient background noise.
Referring now to
FIG. 1
, a diagram of speech energy
110
from an exemplary spoken utterance is shown. In
FIG. 1
, speech energy
110
is shown with time values displayed on the horizontal axis and with speech energy values displayed on the vertical axis. Speech energy
110
is shown as a data sample which begins at time
116
and which ends at time
118
. Furthermore, the particular spoken utterance represented in
FIG. 1
includes a beginning point t
s
which is shown at time
112
and also includes an ending point t
e
which is shown at time
114
.
In many speech detection systems, the system user must identify a spoken utterance by manually indicating the beginning and ending points with a user input device, such as a push button or a momentary switch. This “push-to-talk” system presents serious disadvantages in applications where the system user is otherwise occupied, such as while operating an automobile in congested traffic conditions. A system that automatically identifies the beginning and ending points of a spoken utterance thus provides a more effective and efficient method of implementing speech recognition in many user applications.
Speech recognition systems may use many different techniques to determine endpoints of speech. However, in spite of attempts to select techniques that effectively and accurately allow the detection of human speech, robust speech detection under conditions of significant background noise remains a challenging problem. A system that utilizes effective techniques to perform robust speech detection in conditions with background noise may thus provide more useful and powerful method of speech recognition. Therefore, for all the foregoing reasons, implementing an effective and efficient method for system users to interface with electronic devices remains a significant consideration of system designers and manufacturers.
SUMMARY OF THE INVENTION
In accordance with the present invention, a method for utilizing validity constraints in a speech endpoint detector is disclosed. In one embodiment, a validity manager preferably includes, but is not limited to, a pulse width module, a minimum power module, a duration module, and a short-utterance minimum power module.
In accordance with the present embodiment, the pulse width module may advantageously utilize several constraint variables during the process of identifying a valid reliable island for a particular utterance. The pulse width module preferably measures individual pulse widths in speech energy, and may then store each pulse width in constraint value registers as a single pulse width (SPW) value. The pulse width module may then reference the SPW values to eliminate any energy pulses that are less than a pre-determined duration.
The pulse width module may also measure gap durations between individual pulses in speech energy (corresponding to the foregoing SPW values), and may then store each gap duration in constraint value registers as a pulse gap (PG) value. The pulse width module may then reference the PG values to control the maximum allowed gap duration between the energy pulses to be included a TPW value constraint that is discussed below.
In the present embodiment, the validity manager may advantageously utilize the pulse width module to detect a valid reliable island during conditions where speech energy includes multiple speech energy pulses within a certain pre-determined time period “P”. In certain embodiments, a beginning point for a reliable island is detected when sequential values for the detection parameter DTF are greater than a reliable island threshold T
sr
for a given number of consecutive frames. However, for multi-syllable words, a single syllable may not last long enough to satisfy the condition of P consecutive frames.
The pulse width module may therefore preferably sum each energy pulse identified with a SPW value (subject to the foregoing PG value constraint) to thereby produce a total pulse width (TPW) value, that may also be stored in constraint value registers. The validity manager may thus detect a reliable island whenever a TPW value is greater than a reliable island threshold T
sr
for a given number of consecutive frames “P”.
In addition, the validity manager may preferably utilize the minimum power module to ensure that speech energy below a pre-determined level is not classified as a valid utterance, even when the pulse width module identifies a valid reliable island. Therefore, in the present embodiment, the minimum power module preferably compares the magnitude peak of segments of the speech energy to a pre-determined constant value, and rejects utterances with a magnitude peak speech energy below the constant value as invalid.
In the present embodiment, the validity manager also preferably utilizes the duration module to impose duration constraints on a given detected segment of speech energy. Therefore, the duration module may preferably compare the duration of a detected segment of speech energy to two pre-determined constant duration values. In accordance with the present invention, segments of speech with durations that are greater than a first constant are preferably classified as noise. Segments of speech with durations that are less than a second constant are preferably analyzed further by the short-utterance minimum power module as discussed below.
In the present embodiment, the validity manager may preferably utilize the short-utterance minimum power module to distinguish an utterance of short duration from background pulse noise. To distinguish a short utterance from background noise, the short utterance preferably has a relatively high energy value.
Therefore, the short-utterance minimum power module may preferably compare the magnitude peak of segments of the speech energy to a pre-determined constant value that is relatively larger than the pre-determined constant utilized by the foregoing minimum power module. The present invention thus efficiently and effectively implements a method for utilizing validity constraints in a speech endpoint detector.
REFERENCES:
patent: 4281218 (1981-07-01), Chuang et al.
patent: RE32172 (1986-06-01), Johnston et al.
patent: 4628529 (1986-12-01), Borth et al.
patent: 4821325 (1989-04-01), Martin et al.
patent: 5617508 (1997-04-01), Reaves
patent: 5848388 (1998-12-01), Power et al.
patent: 5884255 (1999-03-01), Cox
patent: 5991277 (1999-11-01), Maeng et al.
patent: 6006175 (1999-12-01), Holzrichter
patent: 6044342 (2000-03-01), Sato et al.
Lawrence E. Rabiner and Ronald W. Schafer, Digital Processing of Speech Signals, Prentice Hall, Upper Saddle River, NJ, 1978, pp. 158-161.
Chen Ruxin
Olorenshaw Lex
Tanaka Miyuki
Wu Duanpei
Dorvil Richemond
Koerner Gregory J.
Simon & Koerner LLP
Storm Donald L.
LandOfFree
Method for utilizing validity constraints in a speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for utilizing validity constraints in a speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for utilizing validity constraints in a speech... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3225450