Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1998-11-30
2002-01-01
Korzuch, William (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S221000, C704S243000
Reexamination Certificate
active
06336090
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to Automatic Speech/Speaker Recognition (ASR) and, more particularly, ASR over wireless communications channels.
BACKGROUND OF THE INVENTION
Automatic Speech/Speaker Recognition (ASR) has become ever more prevalent with improvements in hardware, modeling and recognition algorithms. Among many important applications of ASR technology are those in the telephone and other communications arts. For example, the use of ASR has proven valuable in providing directory assistance, automatic calling and other voice telephony applications over wire circuits. In a parallel area of development, the use of cellular systems, personal communications systems (PCS) and other wireless systems (collectively referred to as “wireless” in the sequel) has continued to proliferate. It is natural, therefore, to seek to apply improvements in ASR achieved in wired systems to wireless systems as well.
ASR over wireless channels is problematic because of the additional noise and distortion introduced into voice signals during the coding, transmission (e.g., due to fading or packet loss), and decoding stages. Noise-degraded voice signals present in wireless environments are often substantially different from the original voice signal, leading to degradation in ASR performances when standard ASR techniques arc applied. This problem has become acute as attempts to create advanced ASR-based services, such as intelligent agent services or large vocabulary speech recognition services over digital wireless channels. Previous approaches have mainly focused on noise reduction techniques, but the results are far from ideal and of limited applicability because of the many variations in wireless environments (e.g. TDMA, CDMA, (GSM, etc.).
Recent studies found that if the feature vectors for ASR purpose can be extracted at the handset and transmitted digitally through a secondary digital channel, there is almost no performance degradation on the ASR performance in the wireless environment as compared to the wired telephone network. A typical prior art dual channel system is illustrated in FIG.
1
. There, a cellular handset
101
is employed by a mobile user to encode normal speech and transmit the coded signal, including relevant coder parameters, through primary (voice) channel
105
to cellular base station
120
. Base station
120
then decodes the received coded signal to produce a voice output suitable for communication over the public switched telephone network (PSTN), or other voice communications network as represented by public switch
130
and its output to a network.
FIG. 1
also shows the generation at the cellular handset
101
of a second set of signals corresponding to the ASR parameters to be used by an ASR application. This second set of signals is transmitted over a second digital channel
110
to cellular base station
120
, where they are forwarded to ASR system
140
.
The experimental use of systems of the type shown in
FIG. 1
have generated interest in creating a standard ASR feature set which can be extracted at the handset and sent through a wireless network as a digital signal using a secondary digital link. Since the bit rate for ASR feature vector transmission can be quite low (<4 Kb/s), it is possible to use a secondary digital link such as that proposed for inclusion in new wireless standards such as IS-134. Although this secondary channel solution seems promising, it has a number of serious drawbacks. In particular this approach requires:
1. A new standard and major changes in communication protocols. Even so, incompatibilities with many current wireless communication standards would require modifications or abandonment of existing standards-compliant network equipment.
2. Extra bandwidth to transmit ASR feature vectors from the handset to the base-station. Synchronizing the primary digital channel for the transmission of voice and the secondary digital channel for the transmission of the extracted ASR feature vectors can also be a serious problem.
3. Major changes to current handsets.
4. A variety of dual-channel solutions. That is, dependence on particular present wireless standards or formats (CDMA, TDMA, GSM, IS-94, IS-134, etc.) and associated signaling and modulation schemes, make a universal solution impractical for all available standards.
5. High initial investment to introduce services based on this technique.
SUMMARY OF THE INVENTION
The limitations of the prior art are overcome and a technical advance is achieved in systems and methods for efficiently and economically enabling ASR capabilities in wireless contexts as described below in connection with illustrative embodiments.
Thus, in accordance with one aspect of the present invention, reliable ASR feature vector sequences are derived at a base station (or other network or system unit) directly from the digitally transmitted speech coder parameters. In many applications the ASR functions are performed at a public switch or elsewhere in a network. With this approach, a novel ASR feature extractor operates on the received speech coder parameters from the handset with no additional processing or signal modification required at the handset. Thus, speech coder parameters received at a base station are used not only for reproducing the voice signal, as at present, but also for generating the feature vector sequence for ASR applications.
An illustrative ASR feature vector extractor at the base-station in operating on digitally transmitted speech coder parameters prior to conversion of these coder parameters back to a voice signal avoids the lossy conversion process and associated voice distortion. In using embodiments of the present invention, there is no need to modify wireless handsets, since the ASR feature vectors are derived from the same set of speech coder parameters ordinarily extracted at the handset. Therefore, existing handsets provide a front end for the ASR feature vector extractor at the base station.
Moreover, the connection from the handset to the base station in digital wireless environments is all-digital and includes error protection for data signals communicated to a base station. Therefore, the transmission from the handset to the present inventive feature extractor at a base-station or other location has the same digital transmission quality as in secondary channel schemes.
Although speech coder parameters are very different from the feature vectors needed for ASR purposes, the present invention provides illustrative techniques for realizing a speech feature extractor based on normal speech coder parameters. Further, in accordance with another aspect of the present invention, perfect synchronization of the (decoded) voice signal and the ASR feature vector signal is provided without additional signal synchronization bits. This is possible, as disclosed in illustrative embodiments of the present invention, because both the voice signal and ASR feature vector signal are generated from the same speech coder parameters.
Overall, the present invention provides systems and methods for enhanced ASR with no need for a secondary channel and no major changes to current wireless standards. Changes, extensions and operational differences at base stations are also minimal. Advantageously, the digital channel for ASR applications is created (through modifications to software) as a second destination for a voice call.
Alternative embodiments perform the ASR feature extraction and ASR functions at a switch connected (directly or through network connections) to the receiving base station. In yet other embodiments the coded speech signals received at a base station from the transmitting handset are forwarded (with or without decoded speech signals) to a network location, including a terminal or storage system.
REFERENCES:
patent: 5909662 (1999-06-01), Yamazaki et al.
patent: 5956683 (1999-09-01), Jacobs et al.
patent: 6092039 (2000-07-01), Zingher
patent: WO 95 17746 (1995-06-01), None
ETSI—European Telecommunications Standards Institute, “European Digital Cellular Tel
Chou Wu
Recchione Michael Charles
Zhou Qiru
Korzuch William
Lucent Technologies - Inc.
Ryan William
Storm Donald L.
LandOfFree
Automatic speech/speaker recognition over digital wireless... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic speech/speaker recognition over digital wireless..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic speech/speaker recognition over digital wireless... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2839745