Image analysis – Pattern recognition – Classification
Reexamination Certificate
1998-06-08
2002-05-14
Au, Amelia M. (Department: 2623)
Image analysis
Pattern recognition
Classification
C382S250000, C382S278000, C708S405000
Reexamination Certificate
active
06389169
ABSTRACT:
BACKGROUND
1. Field of the Invention
The present invention relates generally to image processing systems and, more particularly, to systems and methods for processing image data based upon predetermined regions of human visual interest.
2. Background of the Invention
The Scanpath Theory of human vision, proposed by Noton and Stark in 1971, suggests that a top-down, internal cognitive model of what a person sees when actively looking at an image guides active eye movements of the person and controls and/or influences the person's perception of the image being viewed. Stated somewhat differently, Noton and Stark suggest that eye movements utilized in visually examining an image are generated based at least in part upon an internal cognitive model that has been developed by a person through experience. The term “top down processing” as used herein denotes image processing that proceeds with some assumed knowledge regarding the type of image being viewed or image data being analyzed. Thus, the Scanpath Theory posits that when a person views an image, the eye movements of the person will follow a pattern that is premised upon knowledge of the type of image that is being viewed and/or similar types of images.
The Scanpath Theory recognizes that active eye movements comprise an essential part of visual perception, because these eye movements carry the fovea, a region of high visual acuity in the retina, into each part of an image to be processed. Thus, the Scanpath Theory posits that an internal cognitive model drives human eye movements in a repetitive, sequential set of saccades and fixations (“glances”) over specific regions-of-interest (“ROIs”) in a scene, with the subconscious aim of confirming the top-down, internal cognitive model—the “Mind's Eye”, so to speak.
Experimental investigation of the Scanpath Theory has involved presenting a complex visual stimulus (such as a scenic photograph) to a human subject and recording the eye movements made by the subject while looking at the presented image. Thus, computer-controlled experiments present an image and carefully measure the subject's eye movements using video cameras. Eye movement recordings are then represented as sequences of alternating glances (saccades and fixations), where the duration of each glance generally lasts about 300 milliseconds. Every glance the subject makes while looking at the image enables the high resolution fovea of the retina to abstract information from the image during the fixation period, identifying a fixation point on the image as a visual region-of-interest, or ROI. This is shown, for example, in
FIGS. 8
a,
8
b
and
9
.
Diametrically opposed to the Scanpath Theory, current methods for computerized image processing are usually intended to detect and localize specific features in a digital image in a “bottom-up” fashion, analyzing, for example, spatial frequency, texture conformation, or other informative values of loci of the visual stimulus. The term “bottom up processing” is used herein to denote processing methods that assume no knowledge of an image being viewed or image data being processed. Prior art methods that have been proposed in the literature can be classified into three principal approaches:
1. Structural Methods are based on an assumption that images have detectable and recognizable primitives, which are distributed according to some placement rules—examples of prior art methods that use such an approach are matched filters.
2. Statistical Methods are based on statistical characteristics of the texture of the picture—examples of prior art methods that use a statistical approach are Co-Occurrence Matrices and Entropy Functions.
3. Modeling Methods hypothesize underlying processes for generating local regions of visual interest—examples of prior art that use a modeling approach are Fractal Descriptors.
U.S. Pat. No. 5,535,013, entitled “Image Data Compression and Expansion Apparatus, and Image Area Discrimination Processing Apparatus Therefor,” teaches a method of image data compression in which an image is first divided into square pixel blocks and then encoded using an orthogonal transform. This is a statistical method. The encoding process is based upon a discrete cosine transform, and is thus a JPEG algorithm. Using the coefficients of the discrete cosine transform, the method taught by U.S. Pat. No. 5,535,013 discriminates blocks containing text from blocks containing general, non-text dot images. Then; a selective quantization method is used to identify different quantization coefficients for text blocks and non-text blocks.
Other bottom-up methods of image processing suggest that characterization and decomposition of an image can be based upon primitives such as color, texture, or shape. Such methods can be more powerful than the text
on-text discrimination method of U.S. Pat. No. 5,535,013, but still cannot overcome the important limitation that for a general, complex image, regions of interest are difficult to specify by a single parameter such as color or shape. This is shown, for example, in U.S. Pat. No. 5,579,471, entitled “Image Query System and Method.”
In view of the foregoing, it is submitted that those skilled in the art would find to be quite useful a method and apparatus for image processing which takes into account the underlying nature of human vision and perception, so as to selectively decompose an image into its most meaningful regions of visual interest, thereby providing a means for improving image compression, image query techniques and visual image enhancement systems.
SUMMARY OF THE INVENTION
In one particularly innovative aspect, the present invention is directed to systems and methods for image processing that utilize a cognitive model stored in memory to identify regions within an image that correlate with previously determined regions of visual interest for a given type of image or type of image data being processed.
In another innovative aspect, systems and methods in accordance with the present invention may select algorithms for processing collections of images by comparing algorithmic region of interest (aROI) data to stored human visual region of interest (hROI) data to select an optimal algorithm or group of algorithms to be used in transforming data comprising the collection or collections of images. The selected algorithms may then be used, for example, in data compression, image enhancement or database query functions.
In still another innovative aspect, the present invention is directed to systems and methods that utilize conventional image processing algorithms in combination with innovative clustering, sequencing, comparing and parsing techniques to predict loci of human fixations within an image or within collections of images for the purposes of, for example, data compression, image enhancement and image database query functions. Indeed, empirical analysis reveals that systems and method in accordance with the present invention enable a prediction of human fixation loci that is comparable in measure to the ability of one human to predict the loci of eye movements of other persons viewing an image.
In still another innovative aspect, systems and methods in accordance with the present invention may detect regions of visual interest (ROIs) within an image based upon stored characteristic data representative of human visual perception. For example, using the method(s) of the present invention, algorithmic regions of interest (aROIs) having a high, or relatively high, correlation with human regions of visual interests (hROIs) may be developed for an image or collection of images, and thereafter an image or collection of images may be saved within a system using selected portions of the original picture (i.e., aROIs) as identification data. Then, the selected portions of the picture (i.e., saved aROI data) may be used in performing a query search. The query search may proceed, for example, by comparing saved aROIs in a database with ROIs specified by the system operator. Processing image data in this fashion should provide
Privitera Claudio M.
Stark Lawrence W.
Au Amelia M.
Dastouri Mehrdad
Lyon & Lyon LLP
LandOfFree
Intelligent systems and methods for processing image data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Intelligent systems and methods for processing image data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Intelligent systems and methods for processing image data... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2866069