Speech recognition using nonparametric speech models

Metal tools and implements – making

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Speech recognition using nonparametric speech models Speech recognition using nonparametric speech models

: 1997-02-28
: 2001-05-01
: Hudspeth, David (Department: 2641)
: Metal tools and implements, making

: C704S251000, C704S255000
: Reexamination Certificate
: active
: 06224636
: ABSTRACT:

BACKGROUND
The invention relates to speech recognition.
Speech recognition systems analyze a person's speech to determine what the person said. In a typical frame-based speech recognition system, a processor divides a signal derived from the speech into a series of digital frames, each of which corresponds to a small time increment of the speech. The processor then compares the digital frames to a set of speech models. Each speech model may represent how a word is spoken by a variety of speakers. Speech models also may represent phonemes that correspond to portions of words. Phonemes may be subdivided further within the speech model into phoneme elements (PELs), also known as phoneme nodes. Collectively, the constituent phonemes for a word represent the phonetic spelling of the word.
The processor determines what the speaker said by finding the speech models that best match the digital frames that represent the person's speech. Speech recognition is discussed in U.S. Pat. No. 4,805,218, entitled “METHOD FOR SPEECH ANALYSIS AND SPEECH RECOGNITION,” which is incorporated by reference.
SUMMARY
In one aspect, generally, the invention features evaluating a speech sample by collecting training observations, partitioning the training observations into groups of related training observations, and assessing a degree to which the speech sample resembles a group of training observations. Prior to receiving a speech sample, utterances may be collected from one or more speakers and the training observations may be collected from the utterances.
For each group of training observations, distances between data points representing the speech sample and the training observations may be determined. A degree to which a group of training observations resembles the speech sample may be based on a proximity between the group of training observations and the speech sample.
The assessment of the speech sample may include applying a variable bandwidth kernel density estimator function—for example, a k-th nearest neighbor density function—derived from the training observations to the speech sample.
In a two-pass embodiment, a speech model—for example, a statistical representation—may be established from the training observations and compared against the speech sample. The speech sample may be assessed as resembling a group of training observations based on (i) a result of the comparison of the speech sample against the speech model (first pass) and (ii) a result of the assessment of the speech sample against the group of training observations (second pass). Speech recognition may be accomplished by applying weighting factors to the training observation evaluation result and to the model comparison result.
In a three-pass embodiment, the speech sample may be reevaluated (third pass) against the speech model following the first and second passes described above. In that case, speech recognition may be based on the model comparison result (first pass), the training observation evaluation result (second pass), and the reevaluation result (third pass).
In another aspect, the invention generally features recognizing a speech sample by establishing a speech model (for example, a parametric model or other statistical representation) from training observations and identifying a portion of the speech model based on a comparison of the speech sample with the speech model. The speech sample then is evaluated against a subset of the training observations that corresponds to the identified portion of the speech model. The speech sample's content is recognized based on a result of the evaluation.
In one embodiment, the speech sample is divided into a series of frames, each frame is compared against each portion (e.g., phoneme element) of the speech model, and a score is assigned to each portion of the speech model for each frame. A determination that a portion of the speech model is to be identified may be made if that portion's score exceeds a threshold value. The training observations that correspond to each identified portion of the speech model may be compared against each frame of the speech sample. Based on this comparison, the score for each identified portion may be modified—for example, by smoothing with a weighting factor to produce a smoothed score. The content of the speech sample is recognized as corresponding or not to the identified portion based on the modified score.
In another aspect, a speech recognition system includes an input device configured to receive a speech sample to be recognized, a nonparametric acoustic model comprising utterances from one or more human speakers, and a processor coupled to the input device and to the nonparametric acoustic model. The processor is configured to evaluate the speech sample against the nonparametric acoustic model. The speech recognition system may also include a parametric acoustic model which comprises a statistical representation of the utterances. In that case, the speech sample also is evaluated by the processor against the parametric acoustic model.
In another aspect, the invention generally features a computer program, residing on a computer readable medium, for a speech recognition system which includes a processor and an input device. The computer program includes instructions to receive, via the input device, a speech sample to be recognized and evaluate the speech sample against a nonparametric speech model. The content of the speech sample is recognized based on a result of the evaluation.
In a two-pass embodiment, the computer program includes further instructions to evaluate the speech sample against a parametric speech model and to recognize the content of the speech model based on a result of the parametric evaluation (first pass) and on the result of the nonparametric evaluation (second pass). The parametric evaluation may be performed either before or after the nonparametric evaluation or both before and after the nonparametric evaluation (e.g., in a three-pass embodiment). The parametric evaluation may include instructions to identify a subset of the nonparametric speech model against which the speech sample is to be compared during the nonparametric evaluation. The nonparametric evaluation may include instructions to compare the speech sample against a portion of the nonparametric speech model based on the result of the parametric evaluation, for example, based on the subset of the nonparametric speech model identified during the parametric evaluation.
Advantages of this invention may include one or more of the following. Speech may be recognized with nonparametric recognition techniques to reduce the recognition error rate. Speech samples to be recognized may be compared against actual training observations (e.g., utterances from human speakers) rather than against a crude statistical approximation of the training observations -- i.e., a parametric model. This allows the speech sample to be analyzed in a manner that takes advantage of fine structures present in the training observations.
Further, speech may be recognized by combining parametric and nonparametric processes in a multiple pass manner to achieve more accurate results without sacrificing the timeliness of a recognition result. By using a parametric recognition process to narrow the universe of speech model units against which a speech sample is to be compared, the processing time for recognition is kept within acceptable limits. At the same time, by using a nonparametric recognition process, a rich body of speech model data may be used to enhance the accuracy of the speech recognition process.
Other features and advantages will become apparent from the following description, including the drawings and the claims.

REFERENCES:
patent: 4773093 (1988-09-01), Higgins et al.
patent: 4799262 (1989-01-01), Feldman et al.
patent: 4803729 (1989-02-01), Baker
patent: 4820059 (1989-04-01), Miller et al.
patent: 4837831 (1989-06-01), Gillick et al.
patent: 4876720 (1989-10-01), Kaneko et al.
patent: 4980918 (1990-12-01), Bahl et al.
patent: 5033087 (1991-07-01), Bah

Affiliated with

Gillick Laurence S.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wegmann Steven A.

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dragon Systems, Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Fish & Richardson P.C.

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Hudspeth David

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Wieland Susan

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech recognition using nonparametric speech models does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Speech recognition using nonparametric speech models, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech recognition using nonparametric speech models will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2555200

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure