Speech recognition over packet networks

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S201000, C704S243000, C704S256000, C704S241000

Reexamination Certificate

active

06195636

ABSTRACT:

FIELD OF THE INVENTION
This invention relates to speech recognition. More particularly, this invention relates to a speech recognition system in which feature extraction is performed at a sending end of a packet network and the remainder of the recognition is performed at the receiving end.
BACKGROUND OF THE INVENTION
A typical speech recognition system operates by breaking down an input speech signal into smaller segments over time. Each segment is then individually analyzed, and some features, specifically those acoustic features that have been found relevant for the purpose of speech recognition, are extracted. These extracted features are then matched against reference models for the words in the vocabulary, and the best match is selected.
Speech recognition applications in use today, include voice activated dialing (through a telephone company) and dictation software.
In voice activated dialing through a telephone company, when a user lifts the handset of his telephone, he is connected to a speech recognition server which is located at the telephone company's exchange. The user then speaks the name of a person he wishes to be connected to, and the server interprets the voice command and performs the connection task.
The user is connected to the speech recognition server through a circuit switched network, and a part of the network bandwidth, usually of the order of 8 Kbytes/sec., is constantly devoted to the user for maintaining a connection. Here the server performs the feature extraction, after the decoder has decoded the incoming speech.
Packet networks are replacing the existing Time Division Multiplexing (TDM) based voice networks. In a packet network system
10
, as shown in
FIG. 1
, the speech being sent to the speech recognition server
12
at the telephone exchange will be typically compressed using a speech coder at an access interface
14
at the sending end, to a low rate such as 1 Kbytes/sec. At the exchange, since the information is actually intended for the recognizer at the exchange end, the compressed speech will have to be first decoded, and then passed on to the speech recognition server
12
, e.g., as PCM samples.
This type of system has at least two disadvantages, namely:
The computational load on the telephone exchange server is increased, since it has to first decode the input to speech samples, and then perform all the steps of speech recognition.
The design of compression algorithms for telephony is based on perceptual criteria of voice quality. However these criteria do not necessarily preserve the performance of the speech recognition system and therefore the speech recognition system may not perform well.
In summary, speech recognition typically requires a number of steps or stages and current speech recognition systems perform all of these steps at the same location. Such systems have a problem when the user is remote from the speech recognition system, connected to the recognition system via a system which compresses the user's speech before transmitting the compressed speech, e.g., via a packet network.
Remotely transmitted speech is typically compressed before being sent over a packet network. The reason for this compression is to achieve some efficiency by saving time and space. However, speech compression algorithms are generally designed to trade off space saving with human comprehension and are not designed for compression of acoustic features. That is, they compress speech data as much as possible while still allowing a user at the receiving end to be able to understand the un-compressed speech data. What present systems fail to realize is that sometimes there is speech recognition equipment and processing at the receiving end. In those cases, the losses caused by the compression (and un-compression) of the speech data may degrade speech recognition. One way to overcome this problem is, of course, to transmit uncompressed speech, but this increases the load on the network.
In addition to the problems described above, a further problem arises when a system combines speech recognition with normal speech transmission. In such cases, e.g., in a telephone system, the system would need to have separate compression algorithms for speech and its associated acoustic features which is to be recognized and for speech which is not to be recognized.
SUMMARY OF THE INVENTION
This invention provides a solution to the above and other problems by providing a system wherein the speech recognition process is broken down into two parts, namely speech feature extraction and the remainder of the speech recognition process. Further, the system automatically determines whether or not to use a speech coding algorithm, based on whether or not speech recognition is to be performed at the remote end.
Accordingly, in one aspect, this invention is a speech recognition system including user equipment connected to a packet network; and a speech recognition application server connected to the packet network for performing speech recognition on speech data corresponding to speech input to the user equipment and transmitted to the speech recognition application server via the packet network. The user equipment selectively performs partial speech recognition, specifically, feature extraction, on the speech prior to sending the speech data to the speech recognition application server. Preferably partial speech recognition (i.e., feature extraction) is performed only if speech recognition is to be performed at the receiving end.
In some embodiments, the partial speech recognition performed by the user equipment includes feature extraction from the speech, and wherein the speech data comprises these features. The feature extraction includes at least one of: cepstral analysis, spectral analysis and perceptual analysis, and the extracted features are compressed prior to being transmitted over the packet network. The features are compressed using at least one of: linear quantization and vector quantization.
In another aspect, this invention is a method, in a system in which user equipment is connected to a packet network and a speech recognition application server is also connected to the packet network for performing speech recognition on speech data. The method includes inputting speech to the user equipment and the user equipment selectively performing partial speech recognition on the speech prior to sending the speech data to the speech recognition application server. Preferably partial speech recognition is performed only if speech recognition is to be performed at the receiving end.
The partial speech recognition performed by the user equipment may include feature extraction from the speech, and wherein the speech data comprises these features. The feature extraction may include at least one of: cepstral analysis, spectral analysis and perceptual analysis. In some embodiments, the method includes compressing the extracted features prior to transmitting them over the packet network. The compression may include compressing the features using at least one of linear quantization and vector quantization.
In yet another aspect, this invention is a device having a mechanism for inputting speech; and a mechanism constructed and adapted to selectively perform partial speech recognition on the speech to produce speech data. The partial speech recognition may include feature extraction from the speech, and wherein the speech data comprises these features, and the feature extraction may include at least one of: cepstral analysis, spectral analysis and perceptual analysis. Preferably partial speech recognition is performed only if speech recognition is to be performed at the receiving end.
The device may also have a mechanism constructed and adapted to compress the extracted features. The device may compress the features using at least one of linear quantization and vector quantization.


REFERENCES:
patent: 4644107 (1987-02-01), Clowes et al.
patent: 4928302 (1990-05-01), Kaneuchi et al.
patent: 4945570 (1990-07-01), Gerson et al.
patent: 5036539 (1991-07-01), Wrench, J

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition over packet networks does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition over packet networks, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition over packet networks will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2592212

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.