Method and apparatus for retrieving a video and audio scene...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Recognition

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S500000, C704S235000

Reexamination Certificate

active

06611803

ABSTRACT:

TECHNICAL FIELD
The present invention relates to a video retrieval apparatus and method capable of retrieving a desired scene (video and/or voice) using a key word.
BACKGROUND ART
Recently rapidly popularized computer networks represented by multi-channel broadcast and the internet distribute a huge amount of videos to societies including homes. Meanwhile increased recording medium capacity enables a large amount of video signals to be stored in the homes. This phenomenon requires techniques for retrieving a video scene that a user desires from the large number of video signals easily and with high accuracy.
Conventionally considered methods are a method that detects a changing point of video signals from a variation of the video signals to display a video scene according to the point, and retrieval systems such as a method that detects a particular scene comprised of particular objects to display using an image recognition technique. However there is a problem that in these retrieval systems, a user's purpose of retrieving is not always reflected on a retrieved scene accurately.
Further there is a retrieval system that reads subtitle information and closed caption information that American broadcast adopts from videos by character recognition to retrieve a particular scene. This system enables a user to acquire the scene on which the user's purpose of retrieving is reflected accurately in scenes well-adopting the subtitle information and closed caption. However, since such information is limited to part of broadcast programs because the information needs to be inserted manually, it is difficult to widely apply the information to general videos.
On the other hand, it is expected that using as a key word voice information accompanying videos achieves a retrieval system that reflects a retrieval purpose accurately. Unexamined Japanese Patent Publication HEI6-68168 discloses a video retrieval system that retrieves a desired scene using a voice key word.
FIG. 1
illustrates a functional block diagram of the retrieval system disclosed in above-mentioned Unexamined Japanese Patent Publication HEI6-68168. Voice/video input section
201
receives a voice signal and video signal, voice signal storage section
202
stores the received voice signal, and video signal storage section
203
stores the received video signal. Voice analysis section
204
analyzes the voice signal to generate sequence of characteristic parameters representative of characteristics of the voice. Voice characteristic storage section
205
stores the generated sequence of characteristic parameters.
Meanwhile a key word for a user to use in a scene retrieval later is provided in the form of a voice to key word characteristic analysis section
206
. Key word characteristic analysis section
206
analyzes the voice as the key word to generate sequence of characteristic parameters representative of characteristics of the key word. Key word characteristic parameter storage section
207
stores the generated sequence of characteristic parameters.
Key word interval extraction section
208
compares the sequence of characteristic parameters of the voice signal stored in the storage section
202
with the sequence of characteristic parameters of the key word voice, and extracts a key word interval in the voice signal. Index addition section
209
generates index position data
210
that relates the extracted key word interval to a frame number of the video signal corresponding to the voice signal.
When a retrieval is performed using index position data
210
, it is possible to designate the frame number of the video signal in which the key word appears using the voice signal, thereby enabling video/voice output section
211
to output a corresponding video and voice, and consequently to present the user desired video and voice.
However there is a problem that it is necessary to register in advance a voice key word to be used in a retrieval, and that it is not possible to retrieve using other key words. In particular, a user input uncertain key word results in a retrieval error, and thereby it is not possible to retrieve a scene reflecting a retrieval purpose accurately.
DISCLOSURE OF INVENTION
The present invention is carried out in view of foregoing. It is an object of the present invention to provide an apparatus and method capable of retrieving a scene that a user desires in retrieving a video and/or voice, using an out-of-vocabulary word other than words and key words that are registered in advance for example, in a dictionary, and an uncertain key word that the user inputs.
The present invention provides a scene retrieval system which applies a series of voice recognition processing procedures separately to generation of retrieval data and retrieval processing, and thereby which is capable of retrieving a video/voice scene that a user desires with high speed, and reproducing the scene with high speed.
Further it is designed to generate sequence of a score of a subword, which is an intermediate result of the voice recognition processing, as a retrieval index in generating retrieval data, and to convert an input key word into time series of subword to collate with the retrieval index in retrieval processing.
Therefor it is not necessary to collate with a word dictionary or a retrieval key word registered in advance, and thereby the problem, so-called out-of-vocabulary word problem, is solved that it is not possible to cope with an unregistered key word. Further it is possible to retrieve a video/voice scene with the highest reliability even when a user inputs an uncertain key word.
Moreover the sequence of the score of the subword that is the retrieval index is multiplexed in a data stream along with the video signal and voice signal, whereby it is possible to transmit the retrieval index through broadcast networks and communication networks such as the internet.
The subword is a basic unit of an acoustic model that is smaller than a single word. Examples of the subword is a phoneme, syllable such as consonant-vowel and vowel-consonant-vowel, and demisyllable. Each word is represented as a sequence of subwords.


REFERENCES:
patent: 4987596 (1991-01-01), Ukita
patent: 5473726 (1995-12-01), Marshall
patent: 5710591 (1998-01-01), Bruno et al.
patent: 5774859 (1998-06-01), Houser et al.
patent: 5806036 (1998-09-01), Stork
patent: 5835667 (1998-11-01), Wactlar
patent: 6505153 (2003-01-01), Van Thong et al.
patent: 4436692 (1995-04-01), None
patent: 3-53379 (1991-03-01), None
patent: 5-108727 (1993-04-01), None
patent: 6-68168 (1994-03-01), None
patent: 9-134194 (1997-05-01), None
patent: 10172245 (1998-06-01), None
James, D. A. and S. J. Young, “A Fast Lattice-Based Approach to Vocabulary Independent Wordspotting,” Proc. ICASSP 94, Adelaide, vol. 1, pp. 377-380, 1994.*
English Language Abstract of JP 5-108727.
English Language Abstract of JP 10-172245.
Joho kagaku koza E•19•3 Onsei Ninshiki,pp. 90-93, Yasunaga Niimi, Kyoritsu Shuppan K.K. (Japan) (Oct. 10, 1979), with a partial English language Translation.
Development of a Video Retrieval System by Automatic Speech Recognition and Meta-data Technology,Hiroshi Furuyama et al., Technical Report of IEICE (Institute of Electronics, Information, and Communication Engineers), IE99-2 PRMU99-46, MVE99-42, Jul. 1997, with English language Abstract.
“Acoustic Indexing for Multimedia Retrieval and Browsing”, S.J. Young et al., Acoustics, Speech, and Signal Processing, 1997 IEEE International Conference on Munich, Germany, Apr. 21-24, 1997.
“Phonetic Recognition for Spoken Document Retrieval”, K. Ng et al., Acoustics, Speech and Signal Processing, 1998, pp. 325-328 proceedings of the 1998 IEEE International Conference on Seatlle, Washington, USA, May 12-15, 1998.
“Vision: A Digital Video Library”, W. Li et al., proceedings of the ACM International Conference on Digital Libraries, XX, XX, Mar. 20, 1996, pp. 19-27.
“Audio-to-Visual Conversion for Multimedia Communication”, R. R. Rao et al., IEEE Transactions on Industrial Electronics, IEEE Inc., New York, US, vol. 4

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for retrieving a video and audio scene... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for retrieving a video and audio scene..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for retrieving a video and audio scene... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3114208

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.