Apparatus and method using speech recognition and scripts to...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06336093

ABSTRACT:

FIELD OF THE INVENTION
The present invention is related to the use of speech recognition in data capture, processing, editing, display, retrieval and playback. The invention is particularly useful for capture, authoring, and playback of synchronized audio and video data.
BACKGROUND OF THE INVENTION
While speech recognition technology has been developed over several decades, there are few applications in which speech recognition is commonly used, except for voice assisted operation of computers or other equipment, and for transcription of speech into text, for example, in word processors.
Use of speech recognition with synchronized audio and video has been primarily for developing searchable indexes of video databases. Such systems are shown in, for example: “Automatic Content Based Retrieval Of Broadcast News,” by M. G. Brown et al. in Proceedings of the ACM International Multimedia Conference and Exhibition 1995, pages 35-43; “Vision: A Digital Video Library,” by Wei Li et al., Proceedings of the ACM International Conference on Digital Libraries 1996, pages 19-27; “Speech For Multimedia Information Retrieval,” by A. G. Hauptmann et al. in Proceedings of the 8th ACM Symposium on User Interface and Software Technology, pages 79-80, 1995; “Keyword Spotting for Video Soundtrack Indexing,” by Philippe Gelin, in Proceedings of ICASSP '96, page 299-302, May 1996; U.S. Pat. No. 5,649,060 (Ellozy et al.); U.S. Pat. No. 5,199,077 (Wilcox et al.); “Correlating Audio and Moving Image Tracks,” IBM Technical Disclosure Bulletin No. 10A, Mar. 1991, pages 295-296; U.S. Pat. No. 5,564,227 (Mauldin et al.); “Speech Recognition In The Informedia Digital Video Library: Uses And Limitations,” by A. G. Hauptmann in Proceedings of the 7th IEEE Int'l. Conference on Tools with Artificial Intelligence, pages 288-294, 1995; “A Procedure For Automatic Alignment Of Phonetic Transcriptions With Continuous Speech,” by H. C. Leung et al., Proceedings of ICASSP '84, pages 2.7.1-2.7.3, 1984; European Patent Application 0507743 (Stenograph Corporation); “Integrated Image And Speech Analysis For Content Based Video Indexing,” by Y-L. Chang et al., Proceedings of Multimedia 96, pages 306-313, 1996; and “Four Paradigms for Indexing Video Conferences,” by R. Kazman et al., in IEEE Multimedia, Vol. 3, No. 1, Spring 1996, pages 63-73, all of which are hereby incorporated by reference.
Current technology for editing multimedia programs, such as synchronized audio and video sequences, includes systems such as the media composer and film composer systems from Avid Technology, Inc. of Tewksbury, Massachusetts. Some of these systems use time lines to represent a video program. However, management of the available media data may involve a time intensive manual logging process. This process may be difficult where notations from a script, and the script are used, for example, on a system such as shown in U.S. Pat. No. 4,476,994 (Ettlinger). There are many other uses for speech recognition than mere indexing that may assist in the capture, authoring and playback of synchronized audio and video sequences using such tools for production of motion pictures, television programs and broadcast news.
SUMMARY OF THE INVENTION
Audio associated with a video program, such as an audio track or live or recorded commentary, may be analyzed to recognize or detect one or more predetermined sound patterns, such as words or sound effects. The recognized or detected sound patterns may be used to enhance video processing, by controlling video capture and/or delivery during editing, or to facilitate selection of clips or splice points during editing.
For example, sound pattern recognition may be used in combination with a script to automatically match video segments with portions of the script that they represent. The script may be presented on a computer user interface to allow an editor to select a portion of the script. Matching video segments, having the same sound patterns for either speech or sound effects can be presented as options for selection by the editor. These options also may be considered to be equivalent media, although they may not come from the same original source or have the same duration.
Sound pattern recognition also may be used to identify possible splice points in the editing process. For example, an editor may look for a particular spoken word or sound, rather than the mere presence or absence of sound, in a sound track in order to identify an end or beginning of a desired video segment.
The presence of a desired sound or word in an audio track also may be used in the capturing process to identify the beginning or end of a video segment to be captured or may be used to signify an event which triggers recording. The word or sound may be identified in the audio track using sound pattern recognition. The desired word or sound also may be identified in a live audio input from an individual providing commentary either for a video segment being viewed, perhaps during capture, or for a live event being recorded. The word or sound may be selected, for example, from the script, or based on one or more input keywords from an individual user. For example, a news editor may capture satellite feeds automatically when a particular segment includes one or more desired keywords. When natural breaks in the script are used, video may be divided automatically into segments or clips as it is captured.
Speech recognition also may be used to provide for logging of material by an individual. For example, a live audio input from an individual providing commentary either for a video segment being viewed or for a live event being recorded, may be recorded and analyzed for desired words. This commentary may be based on a small vocabulary, such as commonly used for logging of video material, and may be used to index the material in a database.


REFERENCES:
patent: 4538188 (1985-08-01), Barker et al.
patent: 4685003 (1987-08-01), Westland
patent: 4746994 (1988-05-01), Ettlinger
patent: 5109482 (1992-04-01), Bohrman
patent: 5136655 (1992-08-01), Bronson
patent: 5148154 (1992-09-01), Mackay et al.
patent: 5388197 (1995-02-01), Rayner
patent: 5442744 (1995-08-01), Piech et al.
patent: 5489947 (1996-02-01), Cooper
patent: 5515110 (1996-05-01), Alig et al.
patent: 5534942 (1996-07-01), Bevers, Jr. et al.
patent: 5584006 (1996-12-01), Reber et al.
patent: 5634020 (1997-05-01), Norton
patent: 5649060 (1997-07-01), Ellozy et al.
patent: 5712953 (1998-01-01), Langs
patent: 5786814 (1998-07-01), Moran et al.
patent: 5801685 (1998-09-01), Miller et al.
patent: 5889950 (1999-03-01), Kuzma
patent: 5999173 (1999-12-01), Ubillos
patent: 0 403 118 (1990-12-01), None
patent: 0 469 850 (1992-02-01), None
patent: 0 526064 (1993-02-01), None
patent: 0 564 247 (1993-10-01), None
patent: 0 592 250 (1994-04-01), None
patent: 0 613 145 (1994-08-01), None
patent: 0 689 133 (1995-12-01), None
patent: 0 706 124 (1996-04-01), None
patent: 0 798 917 (1997-10-01), None
patent: 0 877 378 (1998-11-01), None
patent: 0 899 737 (1999-03-01), None
patent: 0 902 431 (1999-03-01), None
patent: WO 93/21636 (1993-10-01), None
patent: WO 94/03897 (1994-02-01), None
patent: WO 97/39411 (1997-10-01), None
patent: WO 98/05034 (1998-02-01), None
patent: WO 98/25216 (1998-06-01), None
Adobe Premiere for Macintosh, All version downloads, www.adobe.com/support/downloads/prmac.htm, pp. 1-2, Nov. 1992-Mar. 2000.*
Kanade, T., “Immersion into Visual Media: New Applications of Image Understanding,” IEEE Expert, ISSN 0885-9000, IEEE, vol. 1, No. 1, Feb. 1996, p. 73-80.
Patent Abstracts of Japan, vol. 097, No. 008, JP 09 091928 A, Abstract and Fig., Apr. 1997, Nippon Telegr 7 Amp; Teleph Corp & NTT>.
Kim, Y-B et al., “Content-Based Video Indexing and Retrieval—A Natural Language Approach,” IEICE Transactions on Information and Systems, vol. E79-D, No. 6, Jun. 1996, p. 695-705.
Comparisonics™, Applications of Comparisonics, pp. 1-5, Web site: http://www.comparisonics.com/apps.html printed Mar. 28, 2000.
The Comparisonics™ White Paper/Apr. 1998, Find Audio and Video! See the Audi

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method using speech recognition and scripts to... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method using speech recognition and scripts to..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method using speech recognition and scripts to... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2817401

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.