Method for direct recognition of encoded speech data

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S219000, C704S275000

Reexamination Certificate

active

06223157

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to a method for providing robust speech recognition of encoded (or compressed) speech data.
BACKGROUND INFORMATION
Speech recognition, the machine translation of spoken utterances into a stream of recognized words or phrases, has received considerable attention from researchers in recent years. In the last decade, speech recognition systems have improved enough to become available to an ever larger number of consumers in the market place.
A number of applications utilizing speech recognition technology are currently being implemented in the telephone network environment, including the digital cellular network environment. For example, a telephone user's spoken commands may now determine call routing or how a call is billed (e.g.,“collect call please” or “calling card”). Similarly, a telephone user may transact business by dialing a merchant's automated system and speaking a credit card number instead of dialing one. Further and future use of speech recognition technology in the digital cellular environment could enhance service in a limitless number of ways.
The Internet, which has also grown and become more popular in recent years, provides another environment in which subscribers may benefit extensively from further use of speech recognition technology. For example, in the future, commercially available systems may allow a user at a remote station to specify, via voice commands, instructions which are then transmitted to an Internet host and executed.
However, Internet connection lines and digital cellular channels have limited transmission capacity with respect to audio or real-time speech. As a result, applications which involve real-time processing of large amounts of speech data over these mediums will often require data compression (or data encoding) prior to transmission. For example, the low bandwidth requirement for the digital cellular medium requires the use of voice data compression that can compress from 5-to-1 to 10-to-1 depending on the compression algorithm used. Compression algorithms used in some Internet browsers operate in this range as well.
Thus, in the network environment, voice data must often be compressed prior to transmission. Once the data reaches a speech recognition engine at a remote site, the limited network bandwidth is no longer a factor. Therefore, it is common practice to de-compress (or decode and reconstruct) the voice data at that point to obtain a digital representation of the original acoustic signal (i.e., a waveform). The waveform can then be processed as though it was originally generated at the remote site. This procedure (i.e., compress-transmit-decompress) allows speech recognition applications to be implemented in the network environment and overcomes issues relating to bandwidth limitation.
However, there are a number of disadvantages associated with this procedure. Specifically, it generally involves redundant processing steps as some of the work done during compression is repeated by the recognition “front-end” processing.
Specifically, much of the speech compression done today is performed by “vocoders.” Rather than create a compressed, digital approximation of the speech signal (i.e., an approximation of a waveform representation), vocoders instead construct digital approximations of components or characteristics of speech implied by a given speech model. For example, a model may define speech as frequency of vocal chord movement (pitch), intensity or loudness of vocal chord movement (energy) or resonance of the vocal tract (spectral). The vocoding algorithm then applies signal processing techniques to the speech signal, leaving only specific signal components including those measuring pitch, energy and spectral speech characteristics.
In similar fashion, a speech recognition system operates by applying signal processing techniques to extract spectral and energy information from a stream of in-coming speech data. To generate a recognition result the extracted speech components are converted into a “feature” and then used in the alignment sub-system where the in-coming feature is compared to the representative features of the models.
Thus, when vocoded speech data is reconstructed into a waveform signal (decompressed) prior to speech recognition processing, speech components (or features) are effectively computed twice. First, during compression, the vocoder will decompose the original (digitized) signal into speech components. Then, during recognition processing, if the incoming data is a reconstructed waveform, the recognition facility must again extract the same or similar features from the reconstructed signal.
Obviously, this procedure i's not optimally efficient. This is particularly true when the step of determining features from the reconstructed signal (i.e., its digital representation) involves significant computational resources and added processing time.
SUMMARY OF THE INVENTION
Accordingly, one advantage of the present invention is that it saves processing time and computational resources by bypassing redundant decompression processing.
Another advantage of the present invention is that it takes advantage of processing already performed during vocoding (i.e., speech data compression).
Another advantage of the present invention is that it renders speech recognition applications practiced in a network environment less complex.
In short, the present invention overcomes the disadvantages of the above described procedure (compress-transmit-decompress). More specifically, the present invention provides a system and method for mapping a vocoded representation of parameters defining speech components, which in turn define a particular waveform, into a base feature type representation of parameters defining speech components (e.g., Linear Predictive Coding (“LPC”)), which in turn define the same digital waveform. This conversion is accomplished through a transform algorithm which is developed prior to system operation, but is executed during operation.
As a result, during operation, construction of the base feature type used in recognition does not require reconstruction of the waveform from vocoded parameters.


REFERENCES:
patent: 5297194 (1994-03-01), Hunt et al.
patent: 5305421 (1994-04-01), Li
patent: 5377301 (1994-12-01), Rosenberg et al.
patent: 5487129 (1996-01-01), Paiss et al.
patent: 5680506 (1997-10-01), Kroon et al.
patent: 5692104 (1997-11-01), Chow et al.
patent: 5787390 (1998-07-01), Quinquis et al.
patent: 6003004 (1999-12-01), Hershkovits et al.
IEEE International Conference on Multimedia Computing and Systems '97. Yapp et al, “speech recognition on MPEG/AUdio encoded files”. P. 624-625. Jun. 1997.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for direct recognition of encoded speech data does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for direct recognition of encoded speech data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for direct recognition of encoded speech data will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2539543

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.