Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-09-05
2004-03-23
Mirzahi, Diane D. (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06711587
ABSTRACT:
BACKGROUND OF THE INVENTION
The World Wide Web (“WWW”) is comprised of millions of documents (web pages) formatted in Hypertext Markup Language (“HTML”), which can be accessed from thousands of users through the Internet. To access a web page, its Uniform Resource Locator (“URL”) must be known. Search engines index web pages and make those URLs available to users of the WWW. To generate an index, a search engine may search the WWW for new web pages using a web crawler. The search engine selects relevant information from a web page after analyzing the content of the web page and saves the relevant information and the web page's URL in the index.
Web pages also contain links to other documents on the WWW, for example, text documents and image files. By searching web pages for links to image files, a search engine connected to the WWW provides an index of image files located on the WWW. The index contains a URL and a representative image from the image file.
Web pages also contain links to multimedia files, such as video and audio files. By searching web pages for links to multimedia files, a multimedia search engine connected to the WWW, such as Scour Inc.'s SCOUR.NET, provides an index of multimedia files located on the WWW. SCOUR.NET's index for video files provides text describing the contents of the video file and the URL for the multimedia file. Another multimedia search engine, WebSEEK, summarizes a video file by generating a highly compressed version of the video file. The video file is summarized by selecting a series of frames from shots or scenes, in the video file and repackaging the frames as an animated GIF file. WebSEEK also generates a color histogram from each shot in the video to automatically classify the video file and allow content-based visual queries. It is described in John R. Smith et al. “An Image and Video Search Engine for the World-Wide Web”, Symposium on Electronic Imaging: Science and Technology—Storage and Retrieval for Image and Video Databases V, San Jose, Calif., Febuary 1997, IS&T/SPIE.
Finding a representative image of a video to display is very subjective. Also, analyzing the contents of digital video files linked to web pages is difficult because of the low quality and low resolution of the highly compressed digital video files.
SUMMARY OF THE INVENTION
One technique for finding a representative image of a video to display is to find a frame which is likely to include people. This technique is described in co-pending U.S. patent application Ser. No. 09/248,545 entitled “System for Selecting a Keyframe to Represent a Video” by Frederic Defaux et al. The likelihood of people in a frame is determined by measuring the percentage of skin-color in the frame. Skin-color detection is a learning-based system trained on large amounts of labeled data sampled from the WWW. Skin color detection returns, for each frame in the shot, the percentage of pixels classified as skin.
The present invention provides a mechanism for selecting a representative image from a video file by providing a technique for applying face detection to a video to select a key frame which may include people and has particular application to indexing video files located by a search engine web crawler. A key frame, one frame representative of a video file, is extracted from the sequence of frames. The sequence of frames may include multiple scenes or shots, for example, continuous motions relative to a camera separated by transitions, cuts, fades and dissolves. To extract a key frame face detection is performed in each frame and a key frame is selected from the sequence of frames based on a sum of detected faces in the frame.
Face detection in a frame may be performed by creating a set of images for the frame. Each image in the set of images is smaller than the previous image. Each image is smaller than the previous image by the same scale factor. Selected ones of the set of images are searched for faces. The selected ones are dependent on the minimum size face to detect. The validity of a detected face is ensured by tracking overlap of a detected face in consecutive frames.
Shot boundaries may be detected in the sequence of frames. A key shot is selected from shots within the detected shot boundaries based on the number of detected faces in the shot. A shot score may be provided for each detected shot. The shot score is based on a set of measures. The measures may be selected from the group consisting of motion between frames, spatial activity between frames, skin pixels, shot length and detected faces. Each measure includes a respective weighting factor. The weighting factor is dependent on the level of confidence of the measure.
Face detection may process different size frames by modifying the size of the frame before performing the face detection.
REFERENCES:
patent: 5485611 (1996-01-01), Astle
patent: 5600775 (1997-02-01), King et al.
patent: 5635982 (1997-06-01), Zhang et al.
patent: 5821945 (1998-10-01), Yeo et al.
patent: 5956026 (1999-09-01), Ratakonda
patent: 5995095 (1999-11-01), Ratakonda
patent: 6014183 (2000-01-01), Hoang
patent: 6173069 (2001-01-01), Daly et al.
patent: 6195458 (2001-02-01), Warnick et al.
patent: 6263088 (2001-07-01), Crabtree et al.
patent: 6278446 (2001-08-01), Liou et al.
patent: 6298145 (2001-10-01), Zhang et al.
patent: 6331859 (2001-12-01), Crinon
patent: 6363380 (2002-03-01), Dimitrova
patent: 6366296 (2002-04-01), Boreczky et al.
patent: 6389168 (2002-05-01), Altunbasak et al.
patent: 2002/0054083 (2002-05-01), Boreczky et al.
Frankel, C., et al., “WebSeer: An Image Search Engine for the World Wide Web,” (Report No. 96-14). Chicago, IL: University of Chicago Computer Science Department. (Aug. 1, 1996).
Smith, J.R., and Chang, S., “Searching for Images and Videos on the World-Wide Web,” (Report No. 459-96-25). New York, NY: Columbia University Dept. of Electrical Engineering and Center for Image Technology for New Media. (Aug. 19, 1996).
Yeo, B., and Liu, B., “Rapid Scene Analysis on Compressed Video,”IEEE Transactions on Circuits and Systems for Video Technology, 5(6) :533-544 (Dec. 1995).
Naphade, M.R., et al., “A High-Performance Shot Boundary Detection Algorithm Using Multiple Cues,”IEEE, 4 pages. (1998).
Zhuang, Y., “Adaptive Key Frame Extraction Using Unsupervised Clustering,”IEEE, 5 pages. (1998).
“Scour.Net Web Site Offers First Multimedia Search Engine and Guide,” Los Angeles, CA: Scour, Inc. company press release, (Aug. 18, 1998).
Zhang, H.Z. et al., “A Video Database System for Digital Libraries,” inAdvance in Digital Libraries, Lecture Notes in Computer Science, Chapter 15, p. 321, Springer Verlag, 1995.
Zhang, H.Z. et al., “Automatic Partitioning of Full-motion Video,” Multimedia Systems, vol. 1, pp. 10-28, Jul. 1993.
Jones, M.J. and Rehg, J.M., “Statistical Color Models with Applications to Skin Detection,” TR 98-11, CRL, Compaq Computer Corp., Dec. 1998.
Rowley, H.A., et al., “Neural Network-Based Face Detection,” IEEE Trans. on PAMI, 20(1);23-38, 1998.
Hewlett--Packard Development Company, L.P.
Mirzahi Diane D.
Mofiz Apu
LandOfFree
Keyframe selection to represent a video does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Keyframe selection to represent a video, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Keyframe selection to represent a video will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3245123