Speech segment detection and word recognition

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S254000, C704S251000, C704S231000

Reexamination Certificate

active

06317711

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech segment detection method, and a speech recognition system and method in which the speech segment detection method is utilized. Further, the present invention relates to a computer-readable medium storing program code instructions that cause the processor to carry out the speech segment detection method.
2. Description of the Related Art
Speech recognition by machine has proven an extremely difficult task. One complicating factor is that, unlike written text, no clear spacing exists between spoken words; speakers typically utter full phrases or sentences without pause. Further, acoustic variability in the speech signal typically precludes an unambiguous mapping to a sequence of words or subword units, such as pronunciations of consonants and vowels. One major source of variability in speech is coarticulation, or the tendency for the acoustic characteristics of a given speech sound or phone sound to differ depending upon the phonetic context in which it is produced.
Speech recognizers can be categorized by the speaking styles, vocabularies, and language models that they accommodate. Isolated word recognizers require speakers to insert brief pauses between individual words. Continuous speech recognizers operate on fluent speech, but typically employ strict language models, or grammars, to limit the number of allowable word sequences. Wordspotters operate on fluent speech as input. However, rather than providing full transcription, wordspotters selectively locate relevant words or phrases in an utterance. Wordspotting is useful both in information-retrieval tasks based on keyword indexing and as an alternative to isolated word recognition in voice command applications.
In principle, the wordspotting technique does not require detecting a speech segment in the input speech signal. However, in practical applications, there are some cases in which the detection of speech segments prior to the recognition process is needed to determine word recognition timing or determine a selected range of the input speech signal to be recognized. If wordspotting is applied, in such cases, to the entire range of the input speech without detecting the speech segments, the processing load will be significantly increased, which is detrimental to quickly obtaining the results of recognition. Hence, the detection of speech segments in the input speech signal is very useful for practical applications of speech recognition.
For example, Japanese Laid-Open Patent Application No.1-244497 discloses a speech segment detection method of one type. In this detection method, an average noise power over some frames of an input signal just following the starting time of a speech segment detection process is calculated, and a speech segment in the input speech signal is detected through the comparison with a threshold level that is varied by the average noise power.
However, the conventional method in the above publication has a problem in effectively detecting the speech segment when a relatively large noise (e.g., a key-depressing sound) takes place just following the time the speech segment detection process is started. A waveform of the input speech signal in such a condition is shown in FIG.
11
. In the case of the waveform shown in
FIG. 11
, it is difficult for the conventional method to accurately detect a start-point of the speech segment (such as one indicated by the arrow “A” in
FIG. 11
) or an end point of the speech segment (such as one indicated by the arrow “B” in
FIG. 11
) since an excessively large threshold level (indicated by the dotted line in
FIG. 11
) is provided due to the average noise power calculated by including the relatively large noise.
Japanese Laid-Open Patent Application No.9-050288 discloses another speech segment detection method. In this speech segment detection method, a portion of an input speech signal in which the amplitude of the input speech signal exceeds a predetermined threshold level is detected as being a startpoint of a speech segment contained in the input speech signal.
Another portion of the input speech signal in which the amplitude is less than the threshold level is detected as being an end point of the speech segment. In this manner, the speech segment in the input speech signal is identified based on the start-point and the end point.
However, the conventional method in the above publication also does not eliminate the above-described problem. In the case of the waveform shown in
FIG. 11
, it is difficult for the conventional method to accurately detect a start-point of the speech segment or an end point of the speech segment when a relatively large noise takes place just following the starting time of the speech segment detection process.
SUMMARY OF THE INVENTION
In order to overcome the problems described above, preferred embodiments of the present invention provide an improved speech segment detection method that effectively detects speech segments in the input speech signal even when a relatively large noise takes place just following the starting time of the speech segment detection process.
According to one preferred embodiment of the present invention, a speech segment detection method in which a sequence of speech samples is provided from an input speech signal and a sequence of feature vectors is provided from the speech samples, the feature vectors having respective speech power levels, the speech segment detection method including the steps of: detecting a minimum speech power among the speech power levels in the feature vector sequence; computing normalized speech power levels based on the speech power levels and the minimum speech power; and comparing each of the normalized speech power levels with a predetermined threshold value to detect speech segments in the input speech signal.
In the speech segment detection method of the preferred embodiment, the minimum speech power among the received speech power levels is detected to obtain the normalized speech power levels, and a speech segment in the input speech signal is detected through the comparison of each of the normalized speech power levels with a predetermined threshold value. The speech segment detection method of the present invention is effective in accurately detecting speech segments in the input speech signal even when a relatively large noise takes place just following the starting time of the speech segment detection process.
According to another preferred embodiment of the present invention, a speech recognition system using the speech segment detection method, includes: a speech input unit which converts an input speech signal into a sequence of speech samples; a feature extraction unit which provides a sequence of feature vectors from the speech samples provided by the speech input unit, the feature vectors having respective speech power levels; a speech segment detection unit which detects speech segments in the input speech signal based on the speech power levels supplied from the feature extraction unit, the speech segment detection unit detecting a minimum speech power among the speech power levels, computing normalized speech power levels based on the speech power levels and the minimum speech power, and comparing each of the normalized speech power levels with a predetermined threshold value to detect the speech segments; and a recognition unit which transforms the sequence of feature vectors into an appropriate message by comparing the feature vectors, respectively identified by the speech segments detected by the speech segment detection unit, with a set of standard patterns.
According to another preferred embodiment of the present invention, a speech recognition method using the speech segment detection method, includes the steps of: converting an input speech signal into a sequence of speech samples; providing a sequence of feature vectors from the speech samples, the feature vectors having respective speech power levels; detecting speech segments in the input speech signa

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech segment detection and word recognition does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech segment detection and word recognition, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech segment detection and word recognition will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2599864

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.