Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1998-01-29
2002-11-12
Korzuch, William (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S273000, C704S275000, C379S067100, C379S088060
Reexamination Certificate
active
06480825
ABSTRACT:
BACKGROUND OF THE INVENTION
The invention is directed to a system and method for detecting a recorded voice that can be used to determine whether an individual is employing a recording device in an attempt to defraud an automatic speaker recognition (“ASR”) system.
1. Field of the Invention.
The invention relates to the fields of digital speech processing and speaker recognition.
2. Description of Related Art
Voice identification and verification systems, sometimes known as automatic speaker recognition (“ASR”) systems, attempt to match the voice of the person whose identity is undergoing identification or verification with the voice of a known user enrolled in the system. ASR systems have, in recent years, become quite reliable in recognizing speech samples generated by enrolled users. Accordingly, ASR systems could potentially be employed in a wide variety of applications.
For example, many banks permit customers to transfer money from their accounts over the phone. A bank normally provides a customer with a numeric password that must be entered via a touch-tone phone before the customer can access his/her account. Should that password be stolen, however, an imposter could gain access to the customer's account. Consequently, banks could add a measure of security by employing an ASR system, wherein a customer's voice must be verified before gaining access to his/her account. ASR systems may also be used, among other things, to: protect personal records; provide physical building security by controlling access to doors; and verify the presence of a convict subject to home detention.
ASR systems can be divided generally into two categories: text-dependent and text-independent. A text-dependent ASR system requires that the user speak a specific password or phrase (the “password”) to gain access. This password is determined by the system or by the user during enrollment, and the system generates and stores a “voice print” from samples of the user saying his/her particular password. A voice print is a mathematical model generated from certain of the user's speech characteristics exhibited during enrollment. During each subsequent verification attempt, the user is prompted again to speak the password. The system extracts the same speech characteristics from the verification sample and compares them to the voice print generated during enrollment.
In a text-independent ASR system, the system builds a more general model of a user's voice characteristics during enrollment. This usually requires the user to speak several sentences during enrollment rather than a simple password so as to generate a complete set of phonemes on which the model may be based. Verification in a text-independent system can involve active prompting or passive monitoring. In an active-prompting system the user is prompted to state specific words or phrases that arc distinct from the words or phrases spoken during enrollment. Such systems check first to ensure that the prompted words were spoken, and, second, to determine whether an authorized user spoke those words. In a passive-monitoring system, the user is expected to speak conversationally after access, and the system monitors the conversation passively until it can determine whether the user is authorized. In either event, verification usually requires the user to speak eight to ten seconds of speech compared with the one to two seconds required in a text-dependent system.
Despite their potential for wide-spread use, ASR systems have enjoyed only limited application to date. One reason for this is that an imposter can defraud as ASR system by playing a recording of an authorized user's voice. If the recording is of a high enough quality, an ASR system recognizes the recorded voice as that of an authorized user and grants access. A variety of recording devices can be used to defraud ASR systems, including wiretapping and tape recording devices. For example, unauthorized users have bugged public telephones with a tape recording device mounted in the vicinity of the phone booth or in the receiver of the phone. In addition, digital voice or speech files and digital audio tapes of an authorized user can be stolen by an imposter and used to gain unauthorized access to the systems protected by ASR techniques.
Some text-independent systems may inherently avoid this problem. In an active-prompting text-independent system, an imposter will not have advanced notice of the phrase required to be spoken during verification and is, therefore, unlikely to have the proper phrase recorded. Further, in a passive-monitoring text-independent system, the imposter is required to have the entire conversation of an authorized user recorded to gain access.
As discussed, however, text-independent systems have drawbacks that make them ill-suited to many applications. For example, active-prompting text-independent systems can be less user friendly than text-dependent systems. A bank customer is likely to complain of having to speak long phrases to gain access to his/her accounts. In addition, there are many applications in which a user is not be expected to speak at all after access, thus making passive-monitoring text-independent systems less useful.
U.S. Pat. No. 5,548,647, entitled “Fixed Text Speaker Verification Method and Apparatus,” issued to Naik et al. on Aug. 20, 1996, provides one method for reducing fraudulent access to a text-dependent system. In the disclosed method, an authorized user enrolls using a number of passwords, such as the numbers one through nine. During verification, the user is prompted to speak a random one or several of the passwords. Without advanced notice of the specific password required for access, an imposter is less likely to have immediate access to the proper recorded password.
Nevertheless, the method taught by Naik has some drawbacks. For example, an imposter who wiretaps an authorized user's phone may eventually be able to collect recordings of each of the randomly prompted passwords and replay the correct password(s) quickly enough during verification to gain access. Moreover, in some settings, an authorized user may purposefully attempt to defraud the ASR system using a recording of his/her own voice. For example, where a convict is subject to home detention, he/she may record all of the random passwords in his/her own voice. Then, when the ASR system calls to ensure that the convict was in his/her home a prescribed time, a cohort could play back the correct password and defraud the system.
What is needed is a reliable system and method to detect the use of a recorded voice over a communications channel.
What is needed is a reliable system and method to prevent fraudulent access to ASR-protected systems using the recorded voice of an authorized user.
SUMMARY OF THE INVENTION
The method and apparatus of the present invention provide significant improvements over the prior art. The present invention employs a variety of techniques that, alone or in combination, provide a reliable system for detecting the use of a recorded voice over a communications channel. Further, the present invention can be employed to improve the ability of both text-dependent and text-independent ASR systems to detect the fraudulent use of a recorded voice. The present invention provides improved performance over the prior art by employing the following techniques and modules alone or in combination to perform the following: (1) analyzing the temporal characteristics of the user's speech; (2) analyzing the characteristics of the channel over which the user's voice is transmitted; (3) training a pattern classifier to recognize the difference between live and recorded speech; and (4) employing an “audio watermark” to detect use of a recording of a previous enrollment or verification attempt.
1. Temporal Characteristics—Summary
Most people cannot naturally repeat a word or phrase exactly the same way. Although the human ear may not be able to hear the difference when an individual repeats a particular word, slight changes in the indi
Mammone Richard J.
Sharma Manish
Korzuch William
McFadden Susan
Merchant & Gould
T-NETIX, Inc.
Young Thomas H.
LandOfFree
System and method for detecting a recorded voice does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for detecting a recorded voice, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for detecting a recorded voice will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2943771