Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
2000-12-05
2004-09-14
Dorvil, Richemond (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S223000, C704S214000, C704S219000, C381S083000
Reexamination Certificate
active
06792405
ABSTRACT:
TECHNICAL FIELD
The present invention relates to automatic speech recognition and, more particularly, to a bitstream-based feature extraction process for wireless communication applications.
BACKGROUND OF THE INVENTION
In the provisioning of many new and existing communication services, voice prompts are used to aid the speaker in navigating through the service. In particular, a speech recognizing element is used to guide the dialogue with the user through voice prompts, usually questions aimed at defining which information the user requires. An automatic speech recognizer is used to recognize what is being said and the information is used to control the behavior of the service rendered to the user.
Modern speech recognizers make use of phoneme-based recognition, which relies on phone-based sub-word models to perform speaker-independent recognition over the telephone. In the recognition process, speech “features” are computed for each incoming frame. Modern speech recognizers also have a feature called “rejection”. When rejection exists, the recognizer has the ability to indicate that what was uttered does not correspond to any of the words in the lexicon.
The users of wireless communication services expect to have access to all of the services available to the users of land-based wireline systems, and to receive a similar quality of service. The voice-activated services are particularly important to the wireless subscribers since the dial pad is generally away from sight when the subscriber listens to a vocal prompt, or is out of sight when driving a car. With speech recognition, there are virtually no restrictions on mobility, because callers do not have to take their eyes off the road to punch in the keys on the terminal.
Currently, one area of research is focusing on the front-end design for a wireless speech recognition system. In general, many prior art front-end designs fall into one of two categories, as illustrated in FIG.
1
. FIG.
1
(
a
) illustrates an arrangement
10
including a speech encoder
12
at the transmitting end, a communication channel
14
(such as a wireless channel) and a speech decoder
16
at the receiving end. The decoded speech is thereafter sent to EAR and also applied as an input to a speech recognition feature extractor
18
, where the output from extractor
18
is thereafter applied as an input to an automatic speech recognizer (not shown). In a second arrangement
20
illustrated in FIG.
1
(
b
), a speech recognition feature encoder
22
is used at the transmitting end to allow for the features themselves to be encoded and transmitted over the (wireless) channel
24
. The encoded features are then applied as parallel inputs to both a speech decoder
26
and a speech recognition feature extractor
28
at the receiving end, the output from feature extractor
28
thereafter applied as an input to an automatic speech recognizer (not shown). This scheme is particularly useful in Internet access applications. For example, when the mel-frequency cepstral coefficients are compressed at a rate of approximately 4 kbit/s, the automatic speech recognizer (ASR) at the decoder side of the coder exhibits a performance comparable to a conventional wireline ASR system. However, this scheme is not able to generate synthesized speech of the quality produced by the system as shown in FIG.
1
(
a
).
The need remaining in the prior art, therefore, is to provide an ASR front-end whose feature recognition performance is comparable to a wireline ASR and is also able to provide decoded speech of high quality.
SUMMARY OF THE INVENTION
The need remaining in the prior art is addressed by the present invention, which relates to a feature extraction system and method and, more particularly, to a bitstream-based extraction process that converts the quantized spectral information from a speech coder directly into a cepstrum.
In accordance with the present invention, the bitstream of the encoded speech is applied in parallel as inputs to both a front-end speech decoder and feature extractor. The feature parameters consist of both spectral envelope and voicing information. The spectral envelope is derived from the quantized line spectrum pairs (LSPs) followed by conversion to LPC cepstral coefficients. The voiced/unvoiced information is directly obtained from the bits corresponding to adaptive and fixed codebook gains of a speech coder. Thus, the cepstrum is directly converted in the speech decoder from the spectral information bits of the speech coder. The use of both the spectral envelope information and the voiced/unvoiced information yields a front-end feature extractor that is greatly improved over the prior art models.
REFERENCES:
patent: 4975955 (1990-12-01), Taguchi
patent: 5732389 (1998-03-01), Kroon et al.
patent: 6009383 (1999-12-01), Mony
patent: 6009391 (1999-12-01), Asghar et al.
patent: 6067513 (2000-05-01), Ishimitsu
patent: 6078886 (2000-06-01), Dragosh et al.
patent: 6092039 (2000-07-01), Zingher
patent: 6141641 (2000-10-01), Hwang et al.
Barnwell et al., “Speech Coding: A Computer Laboratory Textbook,” 1996, John Wiley & Sons, Inc. pp. 85-88, 101-103.*
Kim et al., “Enhanced Distance Measure for LSP-based Speech Recognition,” 1993, Electronics Letters, vol. 29, No. 16, pp. 1463-1465.*
Kleijn et al., “Speech Coding and Synthesis,” 1995, Elsevier pp. 26-374, 458-462.*
Rabiner et I., “Fundamentals of Speech Recognition,” 1993, Prentice Hall, pp. 112-117.
Cox Richard Vandervoort
Kim Hong Kook
AT&T Corp.
Dorvil Richemond
Harper V. Paul
LandOfFree
Bitstream-based feature extraction method for a front-end... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Bitstream-based feature extraction method for a front-end..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bitstream-based feature extraction method for a front-end... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3248903