Telephonic communications – Audio message storage – retrieval – or synthesis – Voice activation or recognition
Reexamination Certificate
1997-04-15
2001-05-22
Weaver, Scott L. (Department: 2748)
Telephonic communications
Audio message storage, retrieval, or synthesis
Voice activation or recognition
C379S088010, C379S088100
Reexamination Certificate
active
06236715
ABSTRACT:
BACKGROUND OF THE INVENTION
Generally, this invention relates to the field of automatic speech recognition technology, and in particular, to an apparatus and method for transmitting speech signals over a control channel in a telecommunications system to initiate calls.
Conventional telephone systems use speech recognition technology to enable voice-activated dialing services and voice-activated directory assistance. With these systems, a directory receives a spoken name, a speech recognition process recognizes the received name, and system elements use the recognized name to find the corresponding telephone number. Once the number is located, a call is then launched to the desired destination. Longstanding problems with such systems, however, have limited their performance in terms of both accuracy and computational speed. Further, to ensure the most accurate speech recognition, conventional systems and methods must transmit the entire speech signal “in-band,” which requires telecommunication data channels due to the high bandwidth.
In conventional telephone networks, control, or signaling, channels transmit control information for establishing terminal links (session set-ups), terminating terminal links (session tear-downs), etc. In contrast, data channels carry data, or media type, signals such as voice and video transmissions. Control channels operate at a much lower data rate than the data channels because the control information requires less bandwidth than media type data signals. In most cases, signaling information is transmitted over a control channel around 8 or 16 Kbps, while data information is transmitted around 64 Kbps. In addition, data channels occupy a greater portion of a communication line's capacity, and thereby limit the number of calls a particular transmission line can accommodate.
Other speech recognition systems perform the entire speech recognition process locally and dial a number based on the result. These systems use a telephone terminal that can perform the three basic stages of speech recognition: feature extraction, pattern classification, and decision logic. In the first stage, relevant characteristics of the speech signal are extracted. The later stages use the extracted features to correlate the spoken name with a previously stored name template. A database lookup is then performed to retrieve a telephone number corresponding to the recognized name.
Systems employing this solution are currently expensive and impractical to implement. One drawback of such systems is that every telephone terminal capable of providing full speech recognition must be able to perform the entire speech recognition process locally before setting up or initiating a call. This requirement forces the terminal to contain both the hardware and software to perform all three phases of the speech recognition process.
The terminal also requires access to a database of recognizable names or speech patterns. The more names the speech processor can recognize the greater and more practical the benefit to the individual user. In the past, this goal has been accomplished by allowing the user to train the speech processor to recognize certain speech patterns and recalling these patterns when a voice-dialing request was made. Alternatively, preprogramming the processor with a number of “templates” allows multiple users to implement voice-dialing from the same terminal. The resulting terminal in both scenarios, however, is expensive and usually has limited voice recognition capabilities.
Other solutions have been proposed to overcome the problems associated with local speech recognition. For example, U.S. Pat. No. 5,488,652 (Bielby et al.) discloses a method and apparatus for training a speech recognition algorithm for directory assistance applications. This allows terminal users to send their voice-activated dialing requests to a remote speech recognition server. With the system disclosed in Bielby et al., the user speaks a name into a receiver at a standard terminal interface and, upon receiving the speech/voice signal, the remote server performs the entire speech recognition process and initiates the desired call. That system, however, requires a high-bandwidth data channel to transmit the speech signal received from the user.
In addition to transmitting the entire speech signal in-band over the data channel, systems such as Bielby et al. require the call to be “set-up” through an analog channel bank or digital interface prior to processing the call information. Call set-up is a procedure used between the call routing switch and the telephone terminal elements. The procedure uses a protocol and switching mechanism that operate jointly to negotiate the set-up and establish the connection between parties. For example, if A places a call to B, A would send a call-request message to the switch with B as the destination number. The switch would then check the status of B and, if B is not busy, send a call-initiate message to A (at which point A hears ringing) and a call-setup message to B (at which point B's phone starts ringing). When B pickups up, a call-accept message is sent from B to the switch. At this point, the switch completes the connection, switching the call, and changes its internal state to show that both A and B are busy.
In digital telephone systems, digital interfaces such as T1, DS30, or other proprietary mechanisms, provide the protocol and switching mechanisms necessary for call set-up between the user and the remote speech processor. Normally, call set-up is required to establish a complete connection because the remote processor needs the entire speech signal before speech recognition can occur.
In the alternative, allowing a user to transmit voice-activated dialing requests “out of band,” over a lower bandwidth control channel, would eliminate the need for the call to be setup prior to the speech recognition process. As a consequence, the digital interface between the user and the speech recognition processor could also be eliminated, which in turn would result in significant cost and equipment savings.
SUMMARY OF THE INVENTION
Voice-activated dialing systems and methods consistent with the present invention can improve efficiency and effectiveness by performing a feature extraction procedure locally at the telephone terminal. In addition, the signal produced by the feature extraction procedure can be transmitted at a lower bandwidth on the control, or signaling, channels of the telephone system, thereby eliminating interface equipment normally required to set-up a call before the speech recognition process occurs.
A voice-activated dialing method consistent with this invention, includes the steps of: receiving, at a telephone terminal, an audio signal corresponding to a desired destination; conditioning the audio signal to extract a set of semantic feature characteristics; forming a feature signal representing the semantic features characteristic of the received audio signal; transmitting the feature signal to a remote speech recognition processor; and retrieving, with the remote speech recognition processor, a name corresponding to the feature signal.
A telephone terminal apparatus consistent with this invention, includes means for receiving an audio signal corresponding to a desired destination; means for conditioning the audio signal to extract a set of semantic feature characteristics; means for forming a feature signal corresponding to the semantic features characteristic of the received audio signal; and means for transmitting the feature signal to a remote speech recognition processor.
Additional advantages of the invention will be set forth in part in the description that follows, and in part will be obvious from the description, or may be learned by practicing the invention. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are intended to provide a further explanation of the invention claimed.
REFERENCES:
patent: 4348550 (1982-09-01), Pirz et al.
patent: 4852170 (1989-07-01), B
Finnegan Henderson Farabow Garrett & Dunner L.L.P.
Nortel Networks Corporation
Weaver Scott L.
LandOfFree
Method and apparatus for using the control channel in... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for using the control channel in..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for using the control channel in... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2450508