Speech processing apparatus and method and computer readable...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06236962

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to a technique for performing speech recognition by using a feature of a speech time series, such as a cepstrum or the like.
The invention also relates to a technique for the removal of convolution distortion, such as line characteristics or the like.
The invention further relates to a technique for enabling an instantaneous or successive adaptation to noise.
2. Related Background Art
In the case of performing speech recognition in a real environment, problems can be caused by convolution distortion due to a distortion of line characteristics due to the influence of a microphone, telephone line characteristics, or the like and an additive noise such as an internal noise or the like. As a method of coping with the distortion of the line characteristics, among them, a Cepstrum Mean Subtraction (CMS) method has been proposed. The CMS method has been disclosed in detail in Rahim, et al., “Signal Bias Removal for Robust Telephone Based Speech Recognition in Adverse Environments”, Proc. of ICASSP'94, 1994 or the like.
The CMS method is a method of compensating for the distortion of the line characteristics. According to such a method, on the basis of information extracted from input speech, the line distortion is corrected on the input time series side or the model side, such as HMM or the like, thereby making it adaptive to the input environment. Thus, even if the line characteristics fluctuate, it is possible to flexibly cope with such a situation.
The CMS method is a method of compensating for convolution distortion (line distortion) which acts due to a convolution of an impulse response. A long-time spectrum of input speech is subtracted from the input speech and a long-time spectrum of a speech used in the model formation is subtracted from a model, thereby normalizing a difference of the line characteristics. The normalizing process is generally performed in a logarithm spectrum region or a cepstrum region. Since the for convolution distortion appears as an additive distortion in those two regions, the for convolution distortion can be compensated for by a subtraction. A method of performing such a process in the cepstrum region is called a CMS method.
By using the CMS method as mentioned above, it is possible to cope with the distortion of the line characteristics due to the influence of the microphone, telephone line characteristics, or the like. In the case of using the CMS method, however, in order to compute a cepstrum long-time mean (CM) from the speech inputted as a recognition target, the user has to wait for completion of the input of speech as a recognition target. The recognizing process is performed after the CM was obtained, namely, after the end of the speech input. Therefore, a recognition algorithm cannot be made operative synchronously with the speech input. It is, consequently, impossible to perform a real-time process according to the conventional method.
SUMMARY OF THE INVENTION
According to the invention, since a distortion of line characteristics which can fluctuate can be compensated for at a high speed in a semi-real-time manner, speech recognition can be performed in a real-time manner with high precision after a normalization of the line characteristics was performed.


REFERENCES:
patent: 4227046 (1980-10-01), Nakajima et al.
patent: 5208863 (1993-05-01), Sakurai et al.
patent: 5220629 (1993-06-01), Kosaka et al.
patent: 5369728 (1994-11-01), Kosaka et al.
patent: 5515475 (1996-05-01), Gupta et al.
patent: 5583961 (1996-12-01), Pawlewski et al.
patent: 5598505 (1997-01-01), Austin et al.
patent: 5604839 (1997-02-01), Acero et al.
patent: 5621849 (1997-04-01), Sakurai et al.
patent: 5787396 (1998-07-01), Komori et al.
patent: 5797116 (1998-08-01), Yamada et al.
patent: 5812975 (1998-09-01), Komori et al.
Chien et al “Noisy speech recognition using variance adapted likelihood measure” 1996 IEEE 45-48.*
Openshaw et al “On the limitations of cepstral features in noise” 1994 IEEE II-49-II-52.*
Acero et al “Robust speech recognition by normalization of the acoustic space” 1991 IEEE 893-896.*
Matsui, T., et al., “N-Best-Based Instantaneous Speaker Adaptation Method for Speech Recognition”, Proceedings ICSLP 96, Fourth International Conference on Spoken Language Processing, Philadelphia, PA, Oct. 3-6 1996, vol. 2, pp. 973-976.
R. Schwartz, et al., “A Comparison of Several Approximate Algorithms for Finding Multiple (N-BEST) Sentence Hypotheses”, ICASSP 91, vol. 1, May 1991, Toronto, Ontario, Canada, S10.4, pp. 701-704.
F.K. Soong, et al., “A Tree-Trellis Based Fast Search for Finding the N Best Sentence Hypotheses in Continuous Speech Recognition”, ICASSP 91, vol. 1, May 1991, Toronto, Ontario, Canada, S10.5, pp. 705-708.
M. Rahim, et al. “Signal Bias Removal for Robust Telephone Based Speech Recognition in Adverse Environments”, ICASSP-94, vol. 1, Apr. 1994, Adelaide, South Australia, pp. I-445-I-448.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech processing apparatus and method and computer readable... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech processing apparatus and method and computer readable..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech processing apparatus and method and computer readable... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2550098

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.