Speech recognition apparatus and method performing speech...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S236000, C704S238000, C704S243000, C704S256000

Reexamination Certificate

active

06823304

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a speech recognition technique using a DP (Dynamic Programming) matching method, a HMM (Hidden Markov Model) method or the like, and more particularly, to a speech recognition apparatus and a speech recognition method with recognition accuracy improved by correctly detecting a consonant at a leading position (hereinafter referred to a lead consonant) of a speech.
2. Description of the Background Art
In recent years, enthusiastic development of a speech recognition apparatus has been seen in information processing system such as a personal computer, a word processor and others in order to enable text input or the like with a speech. In a conventional speech recognition apparatus, well used are speech recognition techniques such as a DP matching method in which a variation in word spoken rate is effectively absorbed by application of pattern matching through non-linear expanding/shrinking of the time axis and a HMM method by which high recognition accuracy can be attained even against variations in voice spectrum caused by an individual difference of a speaker.
FIG. 1
is a block diagram representing the schematic configuration of a conventional speech recognition apparatus. The speech recognition apparatus includes: a microphone
101
converting a speech of a speaker to an analog, electrical signal; an A/D (Analog/Digital) converter
102
converting an analog signal outputted from the microphone
101
to sound data of digital information; a sound analyzer
103
analyzing the sound data outputted from the A/D converter
102
to convert it to a feature parameter
104
; an speech detector
105
detecting an interval of the speech using the sound data outputted from the A/D converter
102
; a matching processing unit
106
performing matching processing of a feature parameter
104
with registered data based on a detection result obtained by the speech detector
105
; and a recognition judgment unit
107
performing judgment on recognition based on a matching result obtained by the matching processing unit
106
to output a recognition result
108
.
Feature parameters adopted here are as follows: power, &Dgr; power, LPC (Linear Predictive Coding) cepstrum, LPC &Dgr; cepstrum and others.
The speech detector
105
calculates sound power through operation of the following equation based on the sound data and judges an interval in which sound power exceeds a prescribed threshold value as a speech interval:
P
=

i
=
0
N

x
i
2
(
1
)
where x
i
is an amplitude value of an ith sound in a frame and N is the number of samples in one frame.
In a case where no noise is mixed into a speech as shown in
FIG. 2A
in the above described speech interval detection method, it is possible to correctly detect a lead consonant interval of the speech from the sound data. The recognition judgment unit
107
can output a correct recognition result of a speech interval.
However, in a case where S/N ratios of the microphone
101
and others are bad and noises are mixed into a speech as shown in
FIG. 2B
, the lead consonant interval of a speech is embedded in the noises. The sound data results in lacking information on a lead consonant component and thereby the recognition judgment unit
107
has an output of a limited recognition result in a detectable range.
Furthermore, a method can be adopted in which like a spectral subtraction technique, information on frequencies of noises are detected in advance to calculate an average thereof and subtraction is performed of the average from each speech frame, followed by detection of a lead consonant interval. This method, however, has problems because of increasing an operational volume to negate high speed processing and since an adverse influence has a chance to be exerted on waveforms themselves of a speech to be analyzed in an environment of high noise levels, thereby disabling correct speech recognition.
SUMMARY OF THE INVENTION
It is accordingly an object of the present invention is to provide a speech recognition apparatus and a speech recognition method capable of causing matching processing to reflect information on a lead consonant component even when the lead consonant cannot be detected due to a noise.
It is another object of the present invention to provided a speech recognition apparatus and a speech recognition method capable of solving a deviation of a start edge position in the matching processing.
It is still another object of the present invention to provide a speech recognition apparatus and a speech recognition method in which a speech recognition speed is increased by reducing the number of matching processing times.
It is a further object of the present invention to provide a speech recognition apparatus and a speech recognition method capable of outputting a recognition result with high possibility even when no correct recognition result is attained.
According to an aspect of the present invention, a speech recognition apparatus includes: a sound analyzer converting sound data to a feature parameter; a voiced sound detector detecting a voiced sound component at a leading position (hereinafter referred to as a lead voiced sound) from the sound data; a lead consonant buffer storing a feature parameter preceding a lead voiced sound detected by the voiced sound detector as a feature parameter of a lead consonant therein; and a recognition processing section performing recognition processing referring to the feature parameter of the lead consonant stored in the lead consonant buffer.
Since the feature parameter preceding a lead voiced sound detected by the voiced sound detector is stored in the lead consonant buffer as a feature parameter of a lead consonant, recognition processing reflecting information on a lead consonant can be performed even when the lead consonant is not detected due to a noise.
According to another aspect of the present invention, a speech recognition method includes the steps of: converting sound data to a feature parameter; detecting a lead voiced sound from the sound data; storing a feature parameter preceding the lead voiced sound detected as a feature parameter of a lead consonant; and performing recognition processing referring to the feature parameter of a lead consonant stored.
Since a feature parameter preceding a lead voiced sound detected is stored as a feature parameter of a lead consonant, matching processing reflecting information on the lead consonant can be performed even when the lead consonant is not detected due to a noise.
The foregoing and other objects, features, aspects and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.


REFERENCES:
patent: 5649056 (1997-07-01), Nitta
patent: 09-068995 (1997-03-01), None
Steven F. Boll, “Suppression Of Acoustic Noise in Speech Using Spectral Subtraction,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-27, No. 2, Apr. 1979, pp. 113-120.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition apparatus and method performing speech... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition apparatus and method performing speech..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition apparatus and method performing speech... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3305670

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.