Variable speed audio playback in speech recognition proofreader

Data processing: speech signal processing – linguistics – language – Audio signal bandwidth compression or expansion

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Variable speed audio playback in speech recognition proofreader Variable speed audio playback in speech recognition proofreader

: 1998-09-02
: 2002-01-08
: Dorvil, Richemond (Department: 2645)
: Data processing: speech signal processing, linguistics, language
: Audio signal bandwidth compression or expansion

: C704S270000, C704S275000
: Reexamination Certificate
: active
: 06338038
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to the field of speech recognition applications, and in particular, to a method and apparatus for controllably varying audio playback speed in a speech recognition proofreader.
2. Description of Related Art
The detection of errors in a document dictated via speech recognition software is facilitated by a proofreading program that plays the originally dictated audio while simultaneously displaying and/or highlighting the text interpreted by the speech system. Proofreading programs operating in a speech recognition system can play dictated audio synchronized with the display and/or highlighting of the recognized text. Playback facilitates the detection of misrecognized words. As each recognized utterance is played, its corresponding text is also “played”, that is, displayed. Such a mechanism helps the user detect incongruities more easily than by visual inspection alone. In addition, the proofreader provides a “marking” capability, allowing the user to mark such errors for later correction. The proofreader stores the marks and allows the user to review them and correct the corresponding text at a later time. However, some speakers dictate so rapidly that during playback the errors are not easily seen, or even if seen, the playback is too rapid for the user the user to accurately mark the error, since the next word may already be playing by the time the user has acted. However, by automatically pausing between each dictated utterance the pace of the playback can be controlled and the user can be afforded the time required to accurately mark the errors.
A typical speech recognition system provides the ability to play the dictated audio for any recognized spoken word. In accordance with this capability, a typical speech recognition system will embody the following features. A first feature is to provide a client with a number (“tag”) that uniquely identifies an individual spoken word or phrase as defined by the speech recognition system. A second feature is that the speech recognition system can be loaded with a memory address pointing to an array of tags and can be directed to play a specific number or range of those tags. A third feature is that the speech recognition system notifies the caller whenever the system has begun playing an individual tag and provides the tag associated with the current spoken word or phrase. The notification occurs asynchronously through the use of a callback function specified by the proofreader and executed by the speech engine. A fourth feature is that the speech recognition system notifies the caller when all the tags have been played. The notification occurs asynchronously through the use of a callback function specified by the proofreader and executed by the speech engine. Such notifications will be generically referred to as “AudioDone” notifications.
There is a long-felt need for methods and apparatus to slow, and even variably control, the pace of playback to overcome this difficulty. There is a further long-felt need to control the pace of playback during proofreading by utilizing the features and capabilities of typical speech recognition systems, as described above.
SUMMARY OF THE INVENTION
In accordance with the inventive arrangements, the capabilities and features of speech recognition systems can be advantageously used in a novel and nonobvious manner to provide the fastest possible playback, to slow the playback and to adjust the speed of playback while playback is in progress.
A single call mode is provided for the fastest possible playback, in accordance with which the speech system is loaded with an array of tags and is then directed to play the entire array as one unit.
A multiple call mode is provided for playing each tag individually at slower and variable speeds, one at a time. A range of tags is played by making multiple calls to the speech system to load and play each tag individually, inserting a delay between each call. The delay can be variable.
A method for inserting a delay between the playback of individual words or phrases as recognized by a speech recognition system, in accordance with the inventive arrangements, comprises the steps of: (A) waiting for a playback command; (B) measuring a delay upon occurrence of the playback command; (C) initiating playback of only one of the individual words or phrases upon expiration of the delay; (D) waiting for a subsequent playback command; and, (E) upon occurrence of the subsequent playback command, repeating the steps (B), (C) and (D) for playing subsequent ones of the individual words or phrases, one at a time.
The method can further comprise the steps of: (F) generating a user interface for detecting the playback command and playing back the individual words and phrases; and, (G) executing the steps (A), (B), (C), (D) and (E) in an independent thread of execution.
The method can also further comprise the steps of: (F) tracking the playback of the individual words and phrases according to an ordered index; (G) issuing a notification each time a playback of one of the individual words or phrases is completed; (H) automatically repeating the steps (B), (C) and (D) for playing subsequent ones of the individual words or phrases responsive to each notification; and, (I) continuing the playing back until all unplayed ones of the individual word or phrases in the ordered index are played back.
In the basic method, and in each of the alternatives, the method can further comprise the step of varying the delay responsive to a user requested delay.
When user requested delays are made, the method can further comprise the steps of: comparing the user requested delay to a predetermined delay; repeating the step (E) if the user requested delay is greater than the predetermined delay; and, terminating the step (E) if the user requested delay is not greater than the predetermined delay. The method can further comprising the step of initiating playback of the individual or words or phrases as a continuous stream responsive to the terminating step.
When user requested delays are made, the method can also further comprise the steps of: comparing the user requested delay to a predetermined delay; changing from playing back the individual words or phrases one at a time to playing back the individual words or phrases as a continuous stream whenever the user requested delay is not greater than the predetermined delay; and, changing from playing back the individual words or phrases as a continuous stream to playing back the individual words or phrases one at a time whenever the user requested delay is greater than the predetermined delay.

REFERENCES:
patent: 5125023 (1992-06-01), Morduch et al.
patent: 5153579 (1992-10-01), Fisch et al.
patent: 5651054 (1997-07-01), Dunn et al.
patent: 5652828 (1997-07-01), Silverman
patent: 5732216 (1998-03-01), Logan et al.
patent: 5768126 (1998-06-01), Frederick
patent: 5850629 (1998-12-01), Holm et al.
patent: 5915001 (1999-06-01), Uppaluru
patent: 5920838 (1999-06-01), Mostow et al.
patent: 6161092 (2000-12-01), Latshaw et al.
patent: 6173259 (2001-01-01), Bijl et al.

Affiliated with

Hanson Gary Robert

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Akerman & Senterfitt

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Dorvil Richemond

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines Corp.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Opsasnick Michael N.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Variable speed audio playback in speech recognition proofreader does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Variable speed audio playback in speech recognition proofreader, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Variable speed audio playback in speech recognition proofreader will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2830879

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure