Method and apparatus for matching slides in video

Image analysis – Pattern recognition – Feature extraction

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Method and apparatus for matching slides in video Method and apparatus for matching slides in video

: 2000-06-14
: 2004-03-02
: Dastouri, Mehrdad (Department: 2623)
: Image analysis
: Pattern recognition
: Feature extraction

: C382S203000, C382S218000, C707S793000
: Reexamination Certificate
: active
: 06701014
: ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention generally relates to an invention of matching electronic slides or the images contained in electronic slides to their appearance in a video stream.
2. Description of the Related Art
Detecting and recognizing slides or transparencies used by a speaker in a video is important for detecting teaching-related events in video. This is particularly important in the domain of distributed or distance learning whose long-standing goal has been to provide a quality of learning comparable to the face-to-face environment of a traditional classroom for teaching or training. Effective preparation of online multimedia courses or training material for users is currently ridden with problems of high cost of manual indexing, slow turnaround, and inconsistencies from human interpretation. Automatic methods of cross-linking and indexing multimedia information are very desirable in such applications, as they can provide an ability to respond to higher level semantic queries, such as for the retrieval of learning material relating to a topic of discussion. Automatic cross-linking multimedia information, however, is a non-trivial problem as it requires the detection and identification of events whose common threads appear in multiple information modalities. An example of such an event are points in a video where a topic was discussed. From a survey of the distance learning community, it has been found that the single most useful query found by students is the querying of topic of interest in a long recorded video of a course lecture. Such classroom lectures and talks are often accompanied by foils (also called slides) some of which convey the topic being discussed at that point in time. When such lectures are video taped, at least one of the cameras used captures the displayed slide, so that the visual appearance of a slide in video can be a good indication of the beginning of a discussion relating to a topic.
The recognition of slides in the video, however, is a challenging problem for several reasons. First, the imaging scenario in which a lecture or talk was taped could be of a variety of forms. There could be a single camera looking at the speaker, the screen showing the slide, and the audience. Also, the camera often zooms or pans. Alternately, a single camera could be dedicated to looking at the screen, or the overhead projector where the transparency is being displayed, while another camera looks at the speaker. The final video in this case, may have been edited to merge the two video streams.
Thus, the slide appearing in a video stream could consume an entire video frame or be a region in a video frame. Secondly, depending on the distance between the camera and the projected slide, the scale at which the slide image appears in the video could be reduced, making it difficult to read the text on slide and/or making it hard to detect text using conventional text recognition methods. Additionally, the color on the slides undergoes transformation due to projection geometry, lighting conditions of the scene, noise in the video capture and MPEG conversion. Finally, the slide image often appears occluded and/or skewed. For example, the slide image may be only partially visible because another object (eg., person giving the talk) is covering or blocking the slide image.
Previous approaches of slide detection have worked on the premise that the camera is focused on the slides, so that a simple change detection through frame subtraction can be used to detect changes in slides. There has been some work done in the multimedia authoring community to address this problem from the point of synchronization of foils with video. The predominant approach has been to do on-line synchronization using a structured note-taking environment to record the times of change of slide electronically and synchronize with the video stream. Current presentation environments such as Lotus Freelance or Powerpoint have features that can also record the change time of slides when performed in rehearse modes. In distributed learning, however, there is often a need for off-line synchronization of slides since they are often provided by the teacher after a video recording of the lecture has been made. The detection of foils in a video stream under these settings can be challenging. A solution to this problem has been proposed for a two-camera geometry, in which one of the cameras was fixed on the screen depicting the slide. Since this was a more-or-less calibrated setting, the boundary of the slide was visible so that the task of selecting a slide-containing region in a video frame was made easy. Further, corners of the visible quadrilateral structure could be used to solve for the ‘pose’ of the slide under the general projective transform. Therefore, there is a need for a system which can recognize slides, regardless of whether the slides are distorted, blurred, or partially blocked. A system and process for “recognizing” slides has not been explored before. The inventive approach to foil detection is meant to consider more general imaging situations involving one or more cameras, and greater variations in scale, pose and occlusions.
Other related art includes the principle of geometric hashing and its variants[Lamdan & Wolfson, Proc. Int. Conf. Computer Vision,(ICCV) 1988, Rigoustous and Wolfson, Spl. Issue on geometric hashing, IEEE Computational Science and Engg., 1997, Wolfson, “On curve matching” in IEEE Transactions Pattern Analysis and Machine Intelligence, vol. 12, pp.483-489, 1990 incorporated herein by reference], that has been applied earlier to the problem of model indexing in computer vision, and the related technique of location hashing that has also been disclosed earlier [Syeda-Mahmood, Proc. Intl. Conf. Computer Vision and Pattern Recognition, CVPR 1999, incorporated herein by reference].
SUMMARY OF THE INVENTION
It is, therefore, an object of the present invention to match electronic slides to their appearance in a video stream. This is accomplished by first generating a set of keyframes from the video stream. Next, likely matches of electronic slide images and images in the keyframes are identified by color-matching the slide images with regions on the keyframes. Next, the geometric features of slide images are extracted from the slides. Geometric features are also extracted from the regions within keyframes. The invention reduces the structure of the slide images and the keyframe images to a corresponding set of affine coordinates. The invention places the slide image affine coordinates in a balanced binary search tree called hash tree for efficiency of search.
The invention matches the slide images and keyframe images by indexing the hash tree with the keyframe image affine coordinates. When matches to the keyframe image affine coordinates are found in the hash tree, the invention records hits in a histogram. Lastly, each keyframe is paired with the slide image with the most hits, and the paired keyframe is designated as containing the corresponding slide image.
In summary, the present invention performs a method for matching slides to video comprising, generating keyframes from the video, extracting geometric keyframe features from the keyframes and geometric slide features from the slides, and matching the geometric slide features and the geometric keyframe features via efficient indexing using a hash tree so as to avoid an exhaustive search of slides per video frame.

REFERENCES:
patent: 5845288 (1998-12-01), Syeda-Mahmood
patent: 6236395 (2001-05-01), Sezan et al.
patent: 6366296 (2002-04-01), Boreczky et al.
patent: 6404925 (2002-06-01), Foote et al.
patent: 6434269 (2002-08-01), Hamburg
patent: 6507838 (2003-01-01), Syeda-Mahmood
Flickner et al, Query by Image and Video Content: The QBIC System, Sep. 1995, IEEE Paper ISSN: 0018-9162, vol. 28, Issue 9, pp. 23-32.*
Multimedia Access and Retrieval: The State of the Art and Future Direction, ACM Multimedia, 1999, ACM 1-58113-151-8, pp. 443-445.*
Tanveer Fathima Sy

Affiliated with

Syeda-Mahmood Tanveer Fathima

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Dastouri Mehrdad

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

International Business Machines - Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

McGinn & Gibb PLLC

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tran, Esq. Khanh Q.

Attorney

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for matching slides in video does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method and apparatus for matching slides in video, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for matching slides in video will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3204769

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure