Speech recognition method and system using compression...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S255000, C704S219000

Reexamination Certificate

active

06377923

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to voice recognizers generally and to voice recognizers which use LPC vocoder data as input.
BACKGROUND OF THE INVENTION
Voice recognizers are well known in the art and are used in many applications. For example, voice recognition is used in command and control applications for mobile devices, in computer Dictaphones, in children's toys and in car telephones. In all of these systems, the voice signal is digitized and then parametrized. The parametrized input signal is compared to reference parametrized signals whose utterances are known. The recognized utterance is the utterance associated with the reference signal which best matches the input signal.
Voice recognition systems have found particular use in voice dialing systems where, when a user says the name of the person he wishes to call, the voice recognition system recognizes the name from a previously provided reference list and provides the phone number associated with the recognized name. The telephone then dials the number. The result is that the user is connected to his destination without having to look for the dialed number and/or use his hands to dial the number.
Voice dialing is especially important for car mobile telephones where the user is typically the driver of the car and thus, must continually concentrate on the road. If the driver wants to call someone, it is much safer that the driver speak the name of the person to be called, rather than dialing the number himself.
FIG. 1
, to which reference is now made, shows the major elements of a digital mobile telephone. Typically, a mobile telephone includes a microphone
10
, a speaker
12
, a unit
14
which converts between analog and digital signals, a vocoder
16
implemented in a digital signal processing (DSP) chip labeled DSP-1, an operating system
18
implemented in a microcontroller or a central processing unit (CPU), a radio frequency interface unit
19
and an antenna
20
. On transmit, the microphone
10
generates analog voice signals which are digitized by unit
14
. The vocoder
16
compresses the voice samples to reduce the amount of data to be transmitted, via RF unit
19
and antenna
20
, to another mobile telephone. The antenna
20
of the receiving mobile telephone provides the received signal, via RF unit
19
, to vocoder
16
which, in tum, decompresses the received signal into voice samples. Unit
14
converts the voice samples to an analog signal which speaker
12
projects. The operating system
18
controls the operation of the mobile telephone.
For voice dialing systems, the mobile telephone additionally includes a voice recognizer
22
, implemented in a separate DSP chip labeled DSP-2, which receives the digitized voice samples as input, parametrizes the voice signal and matches the parametrized input signal to reference voice signals. The voice recognizer
22
typically either provides the identification of the matched signal to the operating system
18
or, if a phone number is associated with the matched signal, the recognizer
22
provides the associated phone number.
FIG. 2
, to which reference is now made, generally illustrates the operation of voice recognizer
22
. The digitized voice samples are organized into frames, of a predetermined length such as 5-20 msec, and it is these frames which are provided (step
28
) to recognizer
22
. For each frame, the recognizer
22
first calculates (step
30
) the energy of the frame.
FIG. 3
, to which reference is now also made, illustrates the per frame energy for the spoken word “RICHARD”, as a function of time. The energy signal has two bumps
31
and
33
, corresponding with the two syllables of the word. Where no word is spoken, as indicated by reference numeral
35
, and even between syllables, the energy level is significantly lower.
Thus, the recognizer
22
searches (step
32
of
FIG. 2
) for the start and end of a word within the energy signal. The start of a word is defined as the point
37
where a significant rise in energy begins after the energy signal has been low for more than a predetermined length of time. The end of a word is defined as the point
39
where a significant drop in energy finishes after which the energy signal remains low for more than a predetermined length of time. In
FIG. 3
, the start point
37
occurs at about 0.37 sec and endpoint
39
occurs at about 0.85 sec.
If a word is found, as checked in step
34
, the voice recognizer
22
performs (step
36
) a linear prediction coding (LPC) analysis to produce parameters of the spoken word. In step
38
, the voice recognizer
22
calculates recognition features of the spoken word and, in step
40
, the voice recognizer
22
searches for a match from among recognition features of reference words in a reference library. Alternatively, the voice recognizer
22
stores the recognition features in the reference library, in a process known as “training”.
Unfortunately, the voice recognition process is computationally intensive and, thus, must be implemented in the second DSP chip, DSP-2. This adds significant cost to the mobile telephone.
SUMMARY OF THE INVENTION
An object of the present invention is to provide a voice recognizer which operates with compressed voice data, compressed by LPC-based, vocoders, rather than with sampled voice data thereby to reduce the amount of computation which the recognizer must perform. Accordingly, the voice recognition can be implemented in the microcontroller or CPU which also implements the operating system. Since the voice recognizer does not analyze the voice signal, the microcontroller or CPU can be a of limited processing power and/or one which does not receive the voice signal.
Moreover, the present invention provides a feature generator which can extract the same type of feature data, for use in recognition, from different types of LPC based vocoders. Thus, the present invention performs the same recognition (e.g. matching and training) operations on compressed voice data which is compressed by different types of LPC based vocoders.
There is therefore provided, in accordance with a preferred embodiment of the present invention, a method for recognizing a spoken word using linear prediction coding (LPC) based, vocoder data without completely reconstructing the voice data. The vocoder based recognizer implements the method described herein. The method includes the steps of generating at least one energy estimate per frame of the vocoder data and searching for word boundaries in the vocoder data using the associated energy estimates. If a word is found, the LPC word parameters are extracted from the vocoder data associated with the word and recognition features are calculated from the extracted LPC word parameters. Finally, the recognition features are matched with previously stored recognition features of other words, thereby to recognize the spoken word.
Additionally, in accordance with a preferred embodiment of the present invention, the energy is estimated from residual data found in the vocoder data. This estimation can be performed in many ways. In one embodiment, the residual data is reconstructed from the vocoder data and the estimate is formed from the norm of the residual data. In another embodiment, a pitch-gain value is extracted from the vocoder data and this value is used as the energy estimate. In a further embodiment, the pitch-gain values, lag values and remnant data are extracted from the vocoder data. A remnant signal is generated from the remnant data and from that, a remnant energy estimate is produced. A non-remnant energy estimate is produced from a non-remnant portion of the residual by using the pitch-gain value and a previous energy estimate defined by the lag value. Finally, the two energy estimates, remnant and non-remnant, are combined.
Moreover, in accordance with a preferred embodiment of the present invention, the vocoder data can be from any of the following vocoders: Regular Pulse Excitation-Long Term Prediction (RPE-LTP) full and half rate, Qualcomm Code Excited Linear Predi

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition method and system using compression... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech recognition method and system using compression..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition method and system using compression... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2834326

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.