Scene model generation from video for use in video processing

Pulse or digital communications – Bandwidth reduction or expansion – Television or motion video signal

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C382S154000

Reexamination Certificate

active

06738424

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates generally to systems for processing digital video data, and more particularly to a method by which video background data can be modeled for use in video processing applications.
2. Description of the Related Art
Full-motion video displays based upon analog video signals have long been available in the form of television. With recent increases in computer processing capabilities and affordability, full motion video displays based upon digital video signals are becoming more widely available. Digital video systems can provide significant improvements over conventional analog video systems in creating, modifying, transmitting, storing, and playing full-motion video sequences.
Digital video displays involve large numbers of image frames that are played or rendered successively at frequencies of between 10 and 60 frames per second. Each image frame is a still image formed from an array of pixels according to the display resolution of a particular system. As examples, NTSC-based systems have display resolutions of 720×486 pixels, and high-definition television (HDTV) systems under development have display resolutions of 1920×1080 pixels.
The amounts of raw digital information included in video sequences are massive. Storage and transmission of these amounts of video information is infeasible with conventional personal computer equipment. With reference to a digitized form of a digitized NTSC image format having a 720×486 pixel resolution, a full-length motion picture of two hours in duration could correspond to 113 gigabytes of digital video information. By comparison, conventional compact optical disks have capacities of about 0.6 gigabytes, magnetic hard disks have capacities of 10-20 gigabytes, and compact optical disks under development have capacities of up to 8 gigabytes.
In response to the limitations in storing or transmitting such massive amounts of digital video information, various video compression standards or processes have been established, including the Motion Picture Expert Group standards (e.g., MPEG-1, MPEG-2, MPEG-4, and H.26X). The conventional video compression techniques utilize similarities within image frames, referred to as spatial or intraframe correlation, to provide intraframe compression in which the motion representations within an image frame are further compressed. Intraframe compression is based upon conventional processes for compressing still images, such as discrete cosine transform (DCT) encoding. In addition, these conventional video compression techniques utilize similarities between successive image frames, referred to as temporal or interframe correlation, to provide interframe compression in which pixel-based representations of image frames are converted to motion representations.
Although differing in specific implementations, the MPEG-1, MPEG-2, and H.26X video compression standards are similar in a number of respects. The following description of the MPEG-2 video compression standard is generally applicable to the others.
MPEG-2 provides interframe compression and intraframe compression based upon square blocks or arrays of pixels in video images. A video image is divided into transformation blocks having dimensions of 16×16 pixels. For each transformation block T
N
in an image frame N, a search is performed across the image of a next successive video frame N+1 or immediately preceding image frame N−1 (i.e., bidirectionally) to identify the most similar respective transformation blocks T
N+1
or T
N−1
.
Ideally, and with reference to a search of the next successive image frame, the pixels in transformation blocks T
N
and T
N+1
are identical, even if the transformation blocks have different positions in their respective image frames. Under these circumstances, the pixel information in transformation block T
N+1
is redundant to that in transformation block T
N
. Compression is achieved by substituting the positional translation between transformation blocks T
N
and T
N+1
for the pixel information in transformation block T
N+1
. In this simplified example, a single translational vector (&Dgr;X, &Dgr;Y) is designated for the video information associated with the 256 pixels in transformation block T
N−1
.
Frequently, the video information (i.e., pixels) in the corresponding transformation blocks T
N
and T
N−1
are not identical. The difference between them is designated a transformation block error E, which often is significant. Although it is compressed by a conventional compression process such as discrete cosine transform (DCT) encoding, the transformation block error E is cumbersome and limits the extent (ratio) and the accuracy by which video signals can be compressed.
Large transformation block errors E arise in block-based video compression methods for several reasons. The block-based motion estimation represents only translational motion between successive image frames. The only change between corresponding transformation blocks T
N
and T
N+1
that can be represented are changes in the relative positions of the transformation blocks. A disadvantage of such representations is that full-motion video sequences frequently include complex motions other than translation, such as rotation, magnification, and shear. Representing such complex motions with simple translational approximations result in the significant errors.
Another aspect of video displays is that they typically include multiple image features or objects that change or move relative to each other. Objects may be distinct characters, articles, or scenery within a video display. With respect to a scene in a motion picture, for example, each of the characters (i.e., actors) and articles (i.e., props) in the scene could be a different object.
The relative motion between objects in a video sequence is another source of significant transformation block errors E in conventional video compression processes. Due to the regular configuration and size of the transformation blocks, many of them encompass portions of different objects. Relative motion between the objects during successive image frames can result in extremely low correlation (i.e., high transformation errors E) between corresponding transformation blocks. Similarly, the appearance of portions of objects in successive image frames (e.g., when a character turns) also introduces high transformation errors E.
Conventional video compression methods appear to be inherently limited due to the size of transformation errors E. With the increased demand for digital video storage, transmission, and display capabilities, improved digital video compression processes are required.
Motion estimation plays an important role in video compression, multimedia applications, digital video archiving, video browsing, and video transmission. It is well known in the art that in video scenes, there exists a high temporal (i.e., time based) correlation between consecutive video image frames. The bit rate for compressing the video scene can be reduced significantly if this temporal correlation is used to estimate the motion between consecutive video image frames.
For example, in block based video compression schemes such as MPEG-1 and MPEG-2, block matching is used to take advantage of temporal correlation. Each of consecutive video image frames is divided into multiple blocks of pixels referred to as pixel blocks. Corresponding pixel blocks are identified in consecutive video image frames, motion transformations between the corresponding pixel blocks are determined, and difference between the transformed pixel blocks represent error signals.
MPEG-4 describes a format for representing video in terms of objects and backgrounds, but stops short of specifying how the background and foreground objects are to be obtained from the source video. An MPEG-4 visual scene may consist of one or more video objects. Each video object is characterized by temporal and spatial information in the

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Scene model generation from video for use in video processing does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Scene model generation from video for use in video processing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Scene model generation from video for use in video processing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3245503

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.