Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1999-10-26
2003-09-09
Abebe, Daniel (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S246000, C379S088010
Reexamination Certificate
active
06618703
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to speech processing in general and to an application programming interface (API) to a speech processing system in a telephony environment in particular.
GLOSSARY
API—Application Programming Interface
Calibration—A preliminary algorithmic process which is occasionally required in order to calibrate the speech processing system and which is usually performed before any training is done. This is occasionally required for the correct operation of the system, and is usually performed once, at installation.
Calibration Formula (CF)—The outcome of the calibration process. In some cases, a calibration formula is needed as an input to the next phases of the speech processing processes.
Calibration Set (CS)—The input to the calibration process.
Calling Application—An application that uses the API in order to receive speech processing services from the speech processing system.
Persistent Object—Data used during the speech processing which needs to be stored, either internally or externally. For example, in the case of speaker verification, the persistent objects include the calibration set, the calibration formula, the voice signatures and the verification formulas.
Speech Processing—Any algorithmic processing on spoken data, including, but not limited to, speaker verification, speaker recognition, word recognition, speech compression and silence removal.
Telephony Environment—An environment in which conversations pass through a telephony medium, for example E1/T1 trunks, modems and analog telephone extensions, and including a network environment, for example, the Internet.
Training—The algorithmic process of learning from specific data in order to perform a particular task. In the case of speaker verification, the input to training is a collection of the speaker's audio segments known as the speaker's voice signature (VS). In the case of word recognition, the input to training is a “word signature”.
Verification Formula (VF)—The output of the training in the case of speaker verification. The verification formula is used in the algorithmic process of speaker verification.
BACKGROUND OF THE INVENTION
Speech processing technologies are known in the art. Speech processing products are available, for example, from Nuance Communications of Menlo Park, Calif., USA and Lernout & Hauspie Speech Products N.V. of Belgium.
Generally, systems providing speech processing services are integrated into other applications, and therefore there is a need for an interface between a speech processing system
100
and a calling application
102
, both shown in FIG.
1
, to which reference is now made. The designers of calling application
102
use an application programming interface (API)
104
so that calling application
102
may receive services from speech processing system
100
. If API
104
is well designed and is adopted by many different vendors of speech processing systems, then the designer of calling application
102
can change from one speech processing system to another without having to change calling application
102
.
I/O Software, Inc. of Riverside, Calif. USA, has produced a biometric API (BAPI) for communication between software applications and biometric devices such as fingerprint scanners and smart cards encoded with fingerprint biometric information.
The Human Authentication API (HA-API) project is an initiative of the US Government's Department of Defense through the Biometric Consortium. The HA-API specification was prepared by National Registry Inc. of Tampa, Fla., USA.
The Speech Recognition API Committee created a speaker verification API (SVAPI). SVAPI enables the calling application to verify a claimed identity only after the speaker has finished speaking. There are a number of situations that SVAPI is unable to support. For example, it does not support online verification, i.e. verification that is performed while the speaker is speaking.
In a further example, SVAPI does not contain commands for handling the data that is required for training and verification, e.g. the voice signatures and the verification formulas. Rather, SVAPI assumes that the calling application is responsible for handling this data.
In another example, SVAPI does not allow the calling application to set policies relating to the speaker verification, such as the frequency of verification updates, the length of audio for verification and decision policies.
IBM Corporation of Armonk, N.Y. USA has produced the “Advanced Identification Services C API”. It is intended to be more specific and detailed than HA-API, but more general than SVAPI.
SUMMARY OF THE INVENTION
There is provided in accordance with a preferred embodiment of the present invention an application programming interface (API) for enabling a calling application to instruct a speech processing system to perform operations including online audio acquisition and algorithmic speech processing operations. The API includes acquisition interface means for enabling the calling application to instruct the speech processing system to acquire online audio from an external communication channel, and processing interface means for enabling the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations on said acquired audio.
According to another aspect of the present invention, the acquisition interface means and the processing interface means include object-oriented classes.
According to another aspect of the present invention, the external communication channel is selected from a group including a particular time slot of a telephone trunk, a particular telephone extension and an audio file of a remote audio storage.
According to another aspect of the present invention, the API further includes provision interface means for enabling the calling application to directly provide the speech processing system with provided audio.
According to another aspect of the present invention, the provision interface means include object-oriented classes.
According to another aspect of the present invention, the provided audio is any of a microphone recording and voice over Internet (VO/IP) data.
According to another aspect of the present invention, the processing interface means also enables the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations on any of the acquired audio, the provided audio and the combination thereof.
According to another aspect of the present invention, the processing interface means includes interface means for enabling the calling application to instruct, during acquisition of the acquired audio, the speech processing system to commence at least one of the algorithmic speech processing operations on the acquired audio.
According to another aspect of the present invention, the processing interface means include interface means for enabling the calling application to instruct the speech processing system to perform at least one of the algorithmic speech processing operations throughout a conversation with a speaker whose audio samples are contained in the acquired audio.
According to another aspect of the present invention, the speech processing system is capable of performing data management operations including creating, storing and retrieving data objects. The API further includes management interface means for enabling the calling application to instruct the speech processing system to perform at least one of the data management operations.
According to another aspect of the present invention, the management interface means include object-oriented classes.
According to another aspect of the present invention, the speech processing system has an internal data store and the management interface means include interface means for enabling the calling application to instruct the speech processing system to store data in the internal data store and to retrieve the data from the internal data store.
According to another aspect of
Peres Renana
Shimoni Guy
Abebe Daniel
Eitan, Pearl, Latzer & Cohen Zedek LLP
Persay Inc.
LandOfFree
Interface to a speech processing system does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Interface to a speech processing system, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Interface to a speech processing system will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3034802