Semantic video object segmentation and tracking

Image analysis – Applications – Target tracking or detecting

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Semantic video object segmentation and tracking Semantic video object segmentation and tracking

: 1998-04-02
: 2002-06-04
: Johns, Andrew W. (Department: 2723)
: Image analysis
: Applications
: Target tracking or detecting

: C348S169000
: Reexamination Certificate
: active
: 06400831
: ABSTRACT:

FIELD OF THE INVENTION
The invention relates to semantic video object extraction and tracking.
BACKGROUND OF THE INVENTION
A video sequence is composed of a series of video frames, where each frame records objects at discrete moments in time. In a digital video sequence, each frame is represented by an array of pixels. When a person views a video frame, it is easy to recognize objects in the video frame, because the person can identify a portion of the video frame as being meaningful to the user. This is called attaching semantic meaning to that portion of the video frame. For example, a ball, an aircraft, a building, a cell, a human body, etc., all represent some meaningful entities in the world. Semantic meaning is defined with respect to the user's context. Although vision seems simple to people, a computer does not know that a certain collection of pixels within a frame depicts a person. To the computer, it is only a collection of pixels. However, a user can identify a part of a video frame based upon some semantic criteria (such as by applying an is a person criteria), and thus assign semantic meaning to that part of the frame; such identified data is typically referred to as a semantic video object.
An advantage to breaking video stream frames into one or more semantic objects (segmenting, or content based encoding) is that in addition to compression efficiency inherent to coding only active objects, received data may also be more accurately reconstructed because knowledge of the object characteristics allows better prediction of its appearance in any given frame. Such object tracking and extraction can be very useful in many fields. For example, in broadcasting and telecommunication, video compression is important due to a large bandwidth requirement for transmitting video data. For example, in a newscast monologue with a speaker in front of a fairly static background, bandwidth requirements may be reduced if one identifies (segments) a speaker within a video frame, removes (extracts) the speaker off the background, and then skips transmitting the background unless it changes.
Using semantic video objects to improve coding efficiency and reduce storage and transmission bandwidth has been investigated in the up-coming international video coding standard MPEG4. (See ISO/IEC JTC1/SC29/WG11
. MPEG
4
Video Verification Model Version
8.0, July. 1997; Lee, et al., A layered video object coding system using sprite and affine motion model, IEEE Tran. on Circuits and System for Video Technology, Vol. 7, No. 1, January 1997.) In the computer domain, web technology has new opportunities involving searching and interacting with meaningful video objects in a still or dynamic scene. To do so, extraction of semantic video objects is very important. In the pattern recognition domain, accurate and robust semantic visual information extraction aids medical imaging, industrial robotics, remote sensing, and military applications. (See Marr,
Vision
, W. H. Freeman, New York, 1982 (hereafter Marr).)
But, although useful, general semantic visual information extraction is difficult. Although human eyes see data that is easily interpreted by our brains as semantic video objects, such identification is a fundamental problem for image analysis. This problem is termed a segmentation problem, where the goal is to aid a computer in distinguishing between different objects within a video frame. Objects are separated from each other using some homogeneous criteria. Homogeneity refers to grouping data according to some similar characteristic. Different definitions for homogeneity can lead to different segmentation results for the same input data. For example, homogeneous segmentation may be based on a combination of motion and texture analysis. The criteria chosen for semantic video object extraction will determine the effectiveness of the segmentation process.
During the past two decades, researchers have investigated unsupervised segmentation. Some researches proposed using homogeneous grayscale/or homogenous color as a criterion for identifying regions. Others suggest using homogenous motion information to identify moving objects. (See Haralick and Shapiro,
Image segmentation techniques
, CVGIP, Vol. 29, pp. 100-132, 1985; C. Gu, Multi-valued morphology and segmentation-based coding, Ph.D. dissertation, LTS/EPFL, (hereafter Gu Ph.D.), http://Itswww.epfl.-ch/Staff/gu.html, 1995.)
This research in grayscale-oriented analysis can be classified into single-level methods and multi-level approaches. Single-level methods generally use edge-based detection methods, k-nearest neighbor, or estimation algorithms. (See Canny, A computational approach to edge detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 8, pp. 679-698, 1986; Cover and Hart, Nearest neighbor pattem classification, IEEE Trans. Information Theory, Vol. 13, pp. 21-27, 1967; Chen and Pavlidis, Image segmentation as an estimation problem, Computer Graphics and Image Processing, Vol. 13, pp. 153-172, 1980).
Unfortunately, although these techniques work well when the input data is relatively simple, clean, and fits the model well, they lack generality and robustness. To overcome these limitations, researchers focused on multi-level methods such as split and merge, pyramid linking, and morphological methods. (See Burt, et al., Segmentation and estimation of image region properties through cooperative hierarchical computation, IEEE Trans. On System, Man and Cybernetics, Vol. 11, pp. 802-809, 1981).
These technologies provide better performance than the prior single-level methods, but results are inadequate because these methods do not properly handle video objects that contain completely different grayscales/colors. An additional drawback to these approaches is that research in the motion oriented segmentation domain assumes that a semantic object has homogeneous motion.
Well known attempts have been made to deal with these problems. These include Hough transformation, multi-resolution region-growing, and relaxation clustering. But, each of these methods is based on optical flow estimation. This estimation technique is known to frequently produce inaccurately determined motion boundaries. In addition, these methods are not suitable to semantic video object extraction because they only employ homogeneous motion information while a semantic video object can have complex motions inside the object (e.g. rigid-body motion).
In an attempt to overcome these limitations, subsequent research focused on object tracking. This is a class of methods related to semantic video object extraction, and which is premised on estimating an object's current dynamic state based on a previous one, where the trajectory of dynamic states are temporally linked. Different features of an image have been used for tracking frame to frame changes, e.g., tracking points, intensity edges, and textures. But these features do not include semantic information about the object being tracked; simply tracking control points or features ignores important information about the nature of the object that can be used to facilitate encoding and decoding compression data. Notwithstanding significant research in video compression, little of this research considers semantic video object tracking.
Recently, some effort has been invested in semantic video object extraction problem with tracking. (See Gu Ph.D.; C. Gu, T. Ebrahimi and M. Kunt, Morphological moving object segmentation and tracking for content-based video coding, International Symposium on Multimedia Communication and Video Coding, New York, 1995, Plenum Press.) This research primarily attempts to segment a dynamic image sequence into regions with homogeneous motions that correspond to real moving objects. A joint spatio-temporal method for representing spatial and temporal relationships between objects in a video sequence was developed using a morphological motion tracking approach. However, this method relies on the estimated optical flow, which, as noted above, generally is not sufficiently accurate.

Affiliated with

Gu Chuang

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Lee Ming-Chieh

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Johns Andrew W.

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Klarquist & Sparkman, LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Microsoft Corporation

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

Tabatabai Abolfazl

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Semantic video object segmentation and tracking does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Semantic video object segmentation and tracking, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Semantic video object segmentation and tracking will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-2971996

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure