Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition
Reexamination Certificate
1998-12-16
2001-05-15
Dorvil, Richemond (Department: 2641)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Recognition
C704S255000
Reexamination Certificate
active
06233556
ABSTRACT:
FIELD OF THE INVENTION
This invention relates to voice processing and verification in automatic interactive telephonic systems. More specifically, the invention relates to an improved technique for telephonic voice processing and verification which may be utilized in a voice processing system while accounting for differences in transmitting telephone equipment or channels.
BACKGROUND OF THE INVENTION
A variety of tasks are necessary in speech systems. Speech recognition is the problem associated with an automated system listening to speech, regardless of the speaker and determining the words or message that is spoken. Speaker identification is the problem of listening to speech and determining which one of a group of known speakers is generating the speech. For speaker verification, the user says they are a particular person and the system determines if they are indeed that person.
For previous systems, a user entered a password using numeric processing modules and a keypad recognition system whereby a user will be able to gain access to the voice system through a string of keystrokes by selecting a string of various pre-ordained numbers, a code, on the telephonic keypad. The code length may vary, depending on the system configuration. A numeric processing module in the telephonic voice processing system is able to identify the user through such code. Each user of the telephonic voice processing system will have a separate and distinct code which can uniquely identify each user to the system individually. This type of configuration suffers from several well known drawbacks. For example, such systems are not intuitive and require a user to remember a sequence of numerical codes.
More recently, a user gained access to a system using a voice processing and verification system.
FIG. 1
shows a conventional voice processing and verification system. Telephone lines
100
are coupled with one or more voice processing modules
101
which each include a voice processing server
102
. Each of the voice processing modules
101
are linked to a common memory
103
. An incoming telephone call is either from a new user or a current user. In some systems, if the user is new to a system, the user is prompted by the voice processing server
102
to identify that fact to the system by pushing a particular digit on the touchtone telephone keypad. This sends a newuser signal to the voice processing server
102
identifying the caller as a new user to the system. If the voice processing server
102
detects a newuser signal, the user's voice is then recorded by the voice processing server
102
, converted to a digital signal, and digitally stored in memory
103
. This is sometimes referred to as the enrollment process.
The enrollment process involves taking a sampling of the user's voice taken over a set interval of time. This enrollment and verification process is exemplary only; other processes may be present in the prior art. Telephonic voice processing and verification systems typically involve an enrollment process whereby a new user initially gains entry to the system by recording a model of an enrollment voice sample. This enrollment voice sample may consist of a single word but preferably is a group of words. The model of the enrollment voice sample is digitally processed and recorded in the memory
103
. Models of enrollment voice samples are also stored for the other users of the system. A user is then able to gain access to the system on subsequent occasions through a comparison with each of the models of their enrollment voice sample stored in memory
103
.
If the user is a current user, and not a new user to the telephonic voice processing system, the user will not enter any digits from his telephone keypad when prompted by the system. The user is first prompted by the voice processing server
102
to identify himself/herself. If known, the user's incoming voice is digitally processed by the voice processing server
102
and stored in a buffer
104
. The telephonic voice verification system then compares the stored incoming voice sample with each of the enrollment voice models which are stored in memory
103
. If the stored incoming voice signal matches the enrollment voice model retrieved from the memory
103
, within a predetermined threshold, the user gains access to the system. If the user is not known to the system, a newuser signal is generated.
Often, in a telephonic voice verification system with multiple users, a comparison may result in a false rejection or false acceptance. A false rejection occurs when the user is denied access to the system when they should be granted access. A false acceptance occurs when the user is allowed access when it should be denied. One common reason for false rejection and false acceptance is caused by variations in the stored incoming voice signal which are attributable to noise and/or signal variations caused by differing telephonic equipment. For example, an enrollment voice model recorded from an initial incoming telephone call made over a carbon button telephone is likely to significantly differ from a subsequent incoming voice signal where the incoming voice signal is from a cellular telephone or an electret telephone.
Common telephone types include carbon button, cellular and electret. Each of these types of telephones introduces a different type of noise or other signal modification. It is well known that users sound different over these different types of telephony equipment. A person receiving a call from another person they know well will recognize differences in the sound of the caller's voice when made from different types of equipment. Such changes to the received signal can cause an automated system to reject a known user. For example, consider a user that provides the enrollment voice sample from a carbon button type phone at their desk. If the same user calls back later from a cellular phone, the user might be rejected because of variances introduced by the equipment differences. This problem could be overcome by changing the threshold levels required for a match in verification; however, such a course of action would lead to increased occurrences of false acceptances. Therefore, what is needed is an improved voice processing and verification system which can account for these variations.
SUMMARY OF THE INVENTION
The invention is a voice processing and verification system which accounts for variations dependent upon telephony equipment differences. Models are developed for the various types of telephony equipment from many users speaking on each of the types of equipment. A transformation algorithm is determined for making a transformation between each of the various types of equipment to each of the others. In other words, a model is formed for carbon button telephony equipment from many users. Similarly, a model is formed for electret telephony equipment from many users, and for cellular telephony equipment from many users. Models can also be formed for any other type of equipment, such as telephone headsets, personal computer microphones and the like.
During an enrollment, a user speaks to the system. The system forms and stores a model of the user's speech. The type of telephony equipment used in the original enrollment session is also detected and stored along with the enrollment voice model. The system determines the types of telephony equipment being, used based upon the spectrum of sound it receives. The telephony equipment type determination is based upon models formed for each of the telephony equipment types spoken by many different users.
Thereafter, when a current user calls in, his/her voice will be compared to the stored model if the same telephony equipment as used in the enrollment is determined. If the user calls in on another type of equipment than that used during the enrollment, the transformation for telephony equipment is applied to the model. The user's voice is then verified against the transformed model. This improves the error rate resulting from different telephony equi
Shahshahani Ben
Teunen Remco
Dorvil Richemond
Haverstock & Owens LLP
Nuance Communications
LandOfFree
Voice processing and verification system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Voice processing and verification system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Voice processing and verification system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2535317