Computer graphics processing and selective visual display system – Computer graphics processing – Graphic manipulation
Reexamination Certificate
2001-01-17
2003-12-30
Zimmerman, Mark (Department: 2671)
Computer graphics processing and selective visual display system
Computer graphics processing
Graphic manipulation
C345S582000, C345S589000, C345S620000, C382S164000, C382S173000, C382S180000
Reexamination Certificate
active
06670963
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to analysis of video quality, and more particularly to an improved visual attention model for automatically determining regions of interest within images of a video signal.
Models of early visual systems that are tuned appropriately provide accurate predictions of the location of visible distortions in compressed natural images. To produce an estimate of subjective quality from fidelity maps, current state-of-the-art quality metrics perform a simple summation of all visible errors. This fails to take into account any higher level or cognitive factors that are known to occur during the subjective assessment of picture quality.
The influence that a distortion has on overall picture quality is known to be strongly influenced by its location with respect to scene content. The variable resolution nature of the Human Visual System (HVS) means that high acuity is only available in the fovea, which has a diameter of about 2 degrees. Knowledge of a scene is obtained through regular eye movements to reposition the area under foveal view. Early vision models assume an “infinite fovea”, i.e., the scene is processed under the assumption that all areas are viewed by the high acuity fovea. However studies of eye movements indicate that viewers do not foveate all areas in a scene equally. Instead a few areas are identified as regions of interest (ROIs) by human visual attention processes and viewers tend to repeatedly return to these ROIs rather than other areas that have not yet been foveated. The fidelity of the picture in these ROIs is known to have the strongest influence on overall picture quality.
The knowledge of human visual attention and eye movements, coupled with selective and correlated eye movement patterns of subjects when viewing natural scenes, provides a framework for the development of computational models of human visual attention. The studies have shown that people's attention is influenced by a number of different features that are present in the picture—motion, luminance contrast, color contrast, object size, object shape, people and faces, location within the scene, and whether the object is part of the foreground or background. A handful of simple visual attention models have been proposed in the literature. These models aim to detect the ROIs in a scene in an unsupervised manner. They have generally been designed for use with uncomplicated still images. A number of deficiencies are apparent in these models which prevent their use as robust attention models for typical entertainment video. These include: the limited number of attention features used; the failure to apply different weights to the different features; the lack of robustness in segmentation techniques; the absence of a temporal model; and the oversimplified algorithms used to extract the attention features. None of the proposed models have been demonstrated to work robustly across a wide range of picture content and their correlation to people's eye movements has not been reported.
As indicated in the paper entitled “A Perceptually Based Quantization Technique for MPEG Encoding”,
Proceedings SPIE
3299
—Human Vision and Electronic Imaging III
, San Jose, USA, pp. 48-159, 26-29 January 1998 by Wilfried Osberger, Anthony J. Maeder and Neil Bergmann, a technique is disclosed for automatically determining the visually important areas in a scene as Importance Maps (IMs). These maps are generated by combining factors known to influence human visual attention and eye movements, as indicated above. For encoding lower quantization is assigned to visually important areas and areas of lesser visual importance have a harsher quantization assigned. Results indicate a subjective improvement in picture quality.
In this prior technique segmentation is performed using a classic recursive split-and-merge segmentation. After segmentation the results were processed by five spatial features to produce individual spatial importance maps: contrast; size; shape; location and background. Motion also was taken into consideration to produce a temporal importance map. Each of these individual importance maps were squared to enhance areas of high importance and then weighted equally to produce a final IM. However it was felt that this technique was not robust enough.
What is desired is an automatic way to predict where ROIs are likely to be located in a natural scene of a typical entertainment video using the properties of human attention and eye movements that is more robust than prior techniques.
BRIEF SUMMARY OF THE INVENTION
Accordingly the present invention provides method for automatically identifying regions of interest in a video picture using a visual attention model. A current frame is adaptively segmented into regions based upon both color and luminance. Each region is processed in parallel by a plurality of spatial feature algorithms including color and skin to produce respective spatial importance maps. The spatial importance maps are combined to produce an overall spatial importance map, the combining being based upon weights derived from eye movement studies. The current frame and a previous frame also are processed to produce motion vectors for the current frame, after which the motion vectors are corrected for camera motion before being converted into a temporal importance map. The overall spatial importance map and the temporal importance map are combined by linear weighting to produce a total importance map for the current frame, with the linear weighting constant being derived from the eye motion studies.
The objects, advantages and other novel features of the present invention are apparent from the following detailed description when read in conjunction with the appended claims and attached drawing.
REFERENCES:
patent: 6442203 (2002-08-01), Demos
Chang et al. “VideoQ: An Automated Content Based Vido Search System Using Visual Cues”, ACM Multimedia 97 Seattle Washington USA, pp. 313-324.*
Osberger et al., “A perceptually Based Quantization Technique for MPEG Encoding”, SPIE Conference Jan. 1998.*
Vleeschouwer et al. “A fuzzy Logic System Able to Detect Interesting Areas of a Video Sequence”, SPIE vol. 3016, 1997, pp. 234-245.*
Osberger et al. “Automatic Identification of Perceptually Important Regions in an Image”, IEEE, Aug. 1998.*
Wilfried Osberger & Anthony .J. Maeder, “Automatic Identification of Perceptually Important Regions in an Image”, IEEE: 14thConference on Pattern Recognition, Aug. 1998.
Wilfried Osberger, Anthony .J. Maeder & Neil Bergmann, “A Perceptually Based Quantization Technique for MPEG Encoding”, SPIE 3299 Conference, Jan. 1998.
C. De Vleeschouwer, X. Marichal, T. Delmot & B. Macq, “A Fuzzy Logic System Able to Detect Interesting Areas of a Video Sequence”, SPIE vol. 3016.
Stephen P. Etz & Jiebo Luo, “Ground Truth for Training and Evaluation of Automatic Main Subject Detection”, SPIE 3959 Human Vision & Electronic Imaging V, Jan. 2000.
Laurent Itti, Christof Koch & Ernst Niebur, “A Model of Saliency-Based Visual Attention for Rapid Scene Analysis”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, No. 11, Nov. 1998.
Jiying Zhao, Yoshihisa Shimazu, Koji Ohta, Rina Hayasaka & Yutaka Matsushita, “An Outstandingness Oriented Image Segmentation and Its Application”, ISSPA, Aug. 1996.
Anthony Maeder, Joachim Diederich & Ernst Niebur, “Limiting Human Perception for Image Sequences”, SPIE vol. 2657.
Gray Francis I.
Nguyen Kimbinh T.
Tektronix Inc.
Zimmerman Mark
LandOfFree
Visual attention model does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Visual attention model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Visual attention model will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3179185