Image analysis – Applications – Target tracking or detecting
Reexamination Certificate
1998-02-06
2001-05-29
Couso, Jose L. (Department: 2721)
Image analysis
Applications
Target tracking or detecting
C382S107000, C382S224000, C382S268000, C348S169000
Reexamination Certificate
active
06240197
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates generally to visual recognition systems and, more particularly, to a technique for disambiguating proximate objects within an image.
BACKGROUND OF THE INVENTION
An interface to an automated information dispensing kiosk represents a computing paradigm that differs from the conventional desktop environment. That is, an interface to an automated information dispensing kiosk differs from the traditional Window, Icon, Mouse and Pointer (WIMP) interface in that such a kiosk typically must detect and communicate with one or more users in a public setting. An automated information dispensing kiosk therefore requires a public multi-user computer interface.
Prior attempts have been made to provide a public multi-user computer interface and/or the constituent elements thereof. For example, a proposed technique for sensing users is described in “Pfinder: Real-time Tracking of the Human Body”, Christopher Wren, Ali Azarbayejani, Trevor Darrell, and Alex Pentland, IEEE 1996. This technique senses only a single user, and addresses only a constrained virtual world environment. Because the user is immersed in a virtual world, the context for the interaction is straight-forward, and simple vision and graphics techniques are employed. Sensing multiple users in an unconstrained real-world environment, and providing behavior-driven output in the context of that environment present more complex vision and graphics problems which are not addressed by this technique.
Another proposed technique is described in “Real-time Self-calibrating Stereo Person Tracking Using 3-D Shape Estimation from Blob Features”, Ali Azarbayejani and Alex Pentland, ICPR January 1996. The implementing system uses a self-calibrating blob stereo approach based on a Gaussian color blob model. The use of a Gaussian color blob model has a disadvantage of being inflexible.
Also, the self-calibrating aspect of this system may be applicable to a desktop setting, where a single user can tolerate the delay associated with self-calibration. However, in an automated information dispensing kiosk setting, some form of advance calibration would be preferable so as to allow a system to function immediately for each new user.
Other proposed techniques have been directed toward the detection of users in video sequences. The implementing systems are generally based on the detection of some type of human motion in a sequence of video images. These systems are considered viable because very few objects move exactly the way a human does. One such system addresses the special case where people are walking parallel to the image plane of a camera. In this scenario, the distinctive pendulum-like motion of human legs can be discerned by examining selected scan-lines in a sequence of video images. Unfortunately, this approach does not generalize well to arbitrary body motions and different camera angles.
Another system uses Fourier analysis to detect periodic body motions which correspond to certain human activities (e.g., walking or swimming). A small set of these activities can be recognized when a video sequence contains several instances of distinctive periodic body motions that are associated with these activities. However, many body motions, such as hand gestures, are non-periodic, and in practice, even periodic motions may not always be visible to identify the periodicity.
Another system uses action recognition to identify specific body motions such as sitting down, waving a hand, etc. In this approach, a set of models for the actions to be recognized are stored and an image sequence is filtered using the models to identify the specific body motions. The filtered image sequence is thresholded to determine whether a specific action has occurred or not. A drawback of this system is that a stored model for each action to be recognized is required. This approach also does not generalize well to the case of detecting arbitrary human body motions.
Recently, an expectation-maximization (EM) technique has been proposed to model pixel movement using simple affine flow models. In this technique, the optical flow of images is segmented into one or more independent rigid body motion models of individual body parts. However, for the human body, movement of one body part tends to be highly dependent on the movement of other body parts. Treating the parts independently leads to a loss in detection accuracy.
The above-described proposed techniques either do not allow users to be detected in a real-world environment in an efficient and reliable manner, or do not allow users to be detected without some form of clearly defined user-related motion. These shortcomings present significant obstacles to providing a fully functional public multi-user computer interface. Accordingly, it would be desirable to overcome these shortcomings and provide a technique for allowing a public multi-user computer interface to detect users.
OBJECTS OF THE INVENTION
The primary object of the present invention is to provide a technique for disambiguating proximate objects within an image.
The above-stated primary object, as well as other objects, features, and advantages, of the present invention will become readily apparent from the following detailed description which is to be read in conjunction with the appended drawings.
SUMMARY OF THE INVENTION
According to the present invention, a technique for disambiguating proximate objects within an image is provided. The technique can be realized by having a processing device such as, for example, a digital computer, obtain an image which is a representation of a plurality of pixels, wherein at least one grouping of substantially adjacent pixels has been identified in the plurality of pixels.
The processing device identifies discontinuities in each of the identified groupings of substantially adjacent pixels. Such discontinuities are typically areas along an outer edge of each of the identified groupings of substantially adjacent pixels that should not be included in each of the identified groupings of substantially adjacent pixels. That is, these areas do not contain any, or substantially any, enabled pixels.
The processing device divides each of the identified groupings of substantially adjacent pixels according to the identified discontinuities. That is, each of the identified groupings of substantially adjacent pixels are divided to eliminate areas that do not contain any, or substantially any, enabled pixels.
The processing device determines if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classified. The processing device can determine if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classified by filtering each of the divided identified groupings of substantially adjacent pixels according to a shape characteristic of the object to be classified. The processing device can also determine if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classified by filtering each of the redefined identified groupings of substantially adjacent pixels according to one or more characteristics that are common to humans. The processing device can further determine if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classified by filtering each of the divided identified groupings of substantially adjacent pixels according to a color characteristic such as, for example, the color blue. The processing device can still further determine if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classified by filtering each of the divided identified groupings of substantially adjacent pixels according to a texture characteristic such as, for example, a distinct pattern. The processing device can still further determine if each of the divided identified groupings of substantially adjacent pixels corresponds to an object to be classif
Avery Brian Lyndall
Christian Andrew Dean
Cesari & McKenna LLP
Compaq Computer Corporation
Couso Jose L.
Mariam Daniel G.
LandOfFree
Technique for disambiguating proximate objects within an image does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Technique for disambiguating proximate objects within an image, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Technique for disambiguating proximate objects within an image will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2540092