Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
1998-10-09
2002-04-16
Dorvil, Richemond (Department: 2741)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S503000, C704S200000
Reexamination Certificate
active
06374225
ABSTRACT:
TECHNICAL FIELD OF THE INVENTION
The present invention pertains to the field of speech, audio, and audio-visual works. In particular, the present invention pertains to method and apparatus for receiving listener input regarding desired speed of playback for portions of a speech, audio, and/or audio-visual work and for developing a “Speed Contour” or “Conceptual Speed Association” data structure which represents the listener input. The listener input serves as a proxy for the listener's interest in, and/or for the listener's ability to comprehend (and/or transcribe), the speech, audio, and/or audio-visual work and will be referred to herein as “listener interest.” For example, the listener might want to slow down some portion of the speech, audio, and/or audio-visual work if the listener was interested in enjoying it more fully, or if the listener was having a hard time comprehending the portion, or if the listener was transcribing information contained in the portion. In further particular, the present invention pertains to method and apparatus for replaying the speech, audio and/or audio-visual work in accordance with the Speed Contour or Conceptual Speed Association data structure to produce a “listener-interest-filtered” work (“LIF” work). The LIF work is useful in a number of applications such as, for example, education, advertising, news delivery, entertainment, public safety announcements and the like.
BACKGROUND OF THE INVENTION
Presently known methods for Time-Scale Modification (“TSM”) enable digitally recorded audio to be modified so that a perceived articulation rate of spoken passages, i.e., a speaking rate, can be modified dynamically during playback. Typical applications of such TSM methods include, but are not limited to, speed reading for the blind, talking books, digitally recording lectures, slide shows, multimedia presentations and foreign language learning. In a typical such application, referred to herein as a Listener-Directed Time-Scale Modification application (“LD-TSM”), a listener can control the speaking rate during playback of a previously recorded speaker. This enables the listener to “speed-up” or “slow-down” the articulation rate and, thereby, the information delivery rate provided by the previously recorded speaker. As is well known to those of ordinary skill in the art, the use of the TSM method in the above-described LD-TSM application enables the sped-up or slowed-down speech or audio to be presented intelligibly at the increased or decreased playback rates. Thus, for example, a listener can readily comprehend material through which he/she is fast-forwarding.
In a typical LD-TSM system, input from the listener can be specified in a number of different ways. For example, input can be specified through the use of key presses (button pushes), mouse movements, or voice commands, all of which are referred to below as “keypresses.” As a result, one can readily appreciate that an LD-TSM system enables a listener to adjust the information delivery rate of a digital audio medium to suit his/her interests and speed of comprehension.
As one can readily appreciate from the above, in order to optimize the use of such an LD-TSM system, there is a need for determining how listeners interact with audio media that provide TSM. In particular, the actual information delivery rate selected by a listener depends on diverse factors such as intelligibility of a speaker, listener interest in the subject matter, listener familiarity with the subject matter, whether the listener is transcribing the content, and the general amount of time the listener has allotted for receiving the contents of the material.
Prior art methods for determining listener interest in portions of speech and/or audio are inherently inaccurate. Specifically, these methods involve detecting fast-forward and rewind patterns of, for example, a cassette tape produced by button pushes. The use of such fast-forward or rewind patterns suffers from various drawbacks. For example, the listener often alternates between fast-forwarding and rewinding over a particular piece of audio material because the information is either not presented, or is unintelligible while fast-forwarding or rewinding. In addition, whenever a playback location is advanced, this either interrupts playback while advancing through the audio material or presents unintelligible versions of the audio material (“chipmunk like” sounds for speed-up, etc.). As such, current methods of determining listener interest are of little use for determining an optimal information delivery rate.
As one can readily appreciate from the above, a need exists in the art for a method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works. In addition, a need exists in the art for a method and apparatus for replaying speech, audio and/or audio-visual works in accordance with the determination of listener interest to provide a listener-interest-filtered work (“LIF” work).
SUMMARY OF THE INVENTION
Embodiments of the present invention advantageously satisfy the above-identified need in the art and provide method and apparatus for determining listener interest in portions of speech, audio, and/or audio-visual works and for developing Speed Contours or Conceptual Speed Association data structures that represent measures of listener interest. In addition, further embodiments of present invention provide method and apparatus for utilizing the Speed Contours or Conceptual Speed Association data structures to play speech, audio and/or audio-visual works in accordance with the Speech Contours or the Conceptual Speed Association data structures to provide listener-interest-filtered works (“LIF” works).
An embodiment of the present invention is an apparatus for generating a Speed Contour which includes an affinity information used to obtain a time-scale modification (TSM) rate and an identifier information used to obtain an identifier of a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) a user input apparatus that receives user information and directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) a time-scale modification monitor, responsive to the user information, the identifier of the portion, and the portion, that generates the TSM rate and the identifier of a portion associated with the TSM rate; and (d) a speed contour generator, responsive to the TSM rate and the identifier of the associated portion, that generates the Speed Contour.
Another embodiment of the present invention is an apparatus for generating a Conceptual Speed Association data structure which includes an affinity information used to obtain a time-scale modification (TSM) rate and a concept information used to obtain a concept identifier for a portion of an audio or audio-visual work associated with the TSM rate, which apparatus comprises: (a) a user input apparatus that receives user information and directs input of a portion of the audio or audio-visual work; (b) a time-scale modification system, responsive to an identifier of the portion, the portion, and a TSM rate, that generates a time-scale modified portion; (c) a concept decoder, responsive to the identifier of the portion and the portion, that generates a concept for the portion; (d) a time-scale modification concept monitor, responsive to the user information and the concept, that generates the TSM rate and a concept identifier associated with the TSM rate; and (e) a conceptual speed association data structure generator, responsive to the TSM rate and the associated concept identifier, that generates the Conceptual Speed Association data structure.
Another embodiment of the present invention is an apparatus which plays an audio or audio-visual work in conjunction with a Speed Contour which includes an affinity information used to obtain a time-scale modificati
Armstrong Angela
Dorvil Richemond
Einschlag Michael B.
Enounce Incorporated
LandOfFree
Method and apparatus to prepare listener-interest-filtered... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus to prepare listener-interest-filtered..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus to prepare listener-interest-filtered... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2862951