Image analysis – Applications – Personnel identification
Reexamination Certificate
1996-10-08
2002-01-29
Coles, Edward (Department: 2622)
Image analysis
Applications
Personnel identification
C382S166000, C382S165000, C358S539000, C358S538000, C358S003050, C348S587000, C348S652000, C348S655000
Reexamination Certificate
active
06343141
ABSTRACT:
FIELD OF THE INVENTION
The present invention to a low bit-rate communication system for multimedia applications, such as a video teleconferencing system, and more particularly, to a method of, and system for, identifying skin areas in video images.
Description of the Related Art
The storage and transmission of full-color, full-motion images is increasingly in demand. These images are used, not only for entertainment, as in motion picture or television productions, but also for analytical and diagnostic tasks such as engineering analysis and medical imaging.
There are several advantages to providing these images in digital form. For example, digital images are more susceptible to enhancement and manipulation. Also, digital video images can be regenerated accurately over several generations with only minimal signal degradation.
On the other hand, digital video requires significant memory capacity for storage and equivalently, it requires a high-bandwidth channel for transmission. For example, a single 512 by 512 pixel gray-scale image with 256 gray levels requires more than 256,000 bytes of storage. A full color image requires nearly 800,000 bytes. Natural-looking motion requires that images be updated at least 30 times per second. A transmission channel for natural-looking full color moving images must therefore accommodate approximately 190 million bits per second. However, modern digital communication applications, including videophones, set-top-boxes for video-on-demand, and video teleconferencing systems have transmission channels with bandwidth limitations, so that the number of bits available for transmitting video image information is less than 190 million bits per second.
As a result, a number of image compression techniques such as, for example, discrete cosine transformation (DCT) have been used to reduce the information capacity required for the storage and transmission of digital video signals. These techniques generally take advantage of the considerable redundancy in any natural image, so as to reduce the amount of data used to transmit, record, and reproduce the digital video images. For example, if the video image to be transmitted is an image of the sky on a clear day, the discrete cosine transform (DCT) image data information has many zero data components since there is little or no variation in the objects depicted for such an image. Thus, the image information of the sky on a clear day is compressed by transmitting only the small number of non-zero data components.
One problem associated with image compression techniques, such as discrete cosine transformation (DCT) is that they produce lossy images, since only partial image information is transmitted in order to reduce the bit rate. A lossy image is a video image which contains distortions in the objects depicted, when the decoded image content is compared with the original image content. Since most video teleconferencing or telephony applications are focused toward images containing persons rather than scenery, the ability to transmit video images without distortions is important. This is because a viewer will tend to focus his or her attention toward specific features (objects) contained in the video sequences such as the faces, hands or other skin areas of the persons in the scene, instead of toward items, such as, for example, clothing and background scenery.
In some situations, a very good rendition of facial features contained in a video sequence is paramount to intelligibility, such as in the case of hearing-impaired viewers who may rely on lip reading. For such an application, decoded video image sequences which contain distorted facial regions can be annoying to a viewer, since such image sequences are often depicted with overly smoothed-out facial features, giving the faces an artificial quality. For example, fine facial features such as wrinkles that are present on faces found in an original video image tend to be erased in a decoded version of a compressed and transmitted video image, thus hampering the viewing of the video image.
Several techniques for reducing distortions in skin areas of images that are transmitted have focused on extracting qualitative information about the content of the video images including faces, hands and the other skin areas of the persons in the scene, in order to code such identified areas using fewer data compression components. Thus, these identified areas are coded and transmitted using a larger number of bits per second, so that such areas contain fewer distorted features when the video images are decoded.
In one technique, a sequence of video images is searched for symmetric shapes. A symmetric shape is defined as a shape which is divisible into identical halves about an axis of symmetry. An axis of symmetry is a line segment which divides an object into equal parts. Examples of symmetrical shapes include squares, circles and ellipses. If the objects in a video image are searched for symmetrical shapes, some of the faces and heads shown in the video image are identifiable. Faces and heads that are depicted symmetrically, typically approximate the shape of an ellipse and have an axis of symmetry vertically positioned between the eyes, through the center of the nose and halfway across the mouth. Each half-ellipse is symmetric because each contains one eye, half of the nose and half of the mouth. However, only those faces and heads that are symmetrically depicted in the video image are recognizable, precluding the identification of heads and faces when viewed in profile (turned to the left or turned to the right), since a face or head viewed in profile does not contain an axis of symmetry. Hands and other skin areas of the persons in the scene are similarly not symmetric objects and are also not recognizable using a symmetry based technique.
Another technique, searches the video images for specific geometric shapes such as, for example, ellipses, rectangles or triangles. Searching the video images for specific geometric shapes can often locate heads and faces, but still cannot identify hands and other skin areas of persons in the scene, since such areas are typically not represented by a specified geometric shape. Additionally, partially obstructed faces and heads which do not approximate a specified geometric shape are similarly not recognizable.
In yet another technique, a sequence of video images is searched using color (hue) to identify skin areas including heads, faces and hands. Color (hue) based identification is dependent upon using a set of specified skin tones to search the video sequences for objects which have matching skin colors. While the color (hue) based techniques are useful to identify some hands, faces or other skin areas of a scene, many other such areas can not be identified since not all persons have the same skin tone. In addition, color variations in many skin areas of the video sequences will also not be detectable. This is because the use of a set of specified skin tones to search for matching skin areas precludes color based techniques from compensating for unpredictable changes to the color of an object, such as variations attributable to background lighting and/or shading.
Accordingly, skin identification techniques that identify hands, faces and other skin areas of persons in a scene continue to be sought.
SUMMARY OF THE INVENTION
The present invention is directed to a skin area detector for identifying skin areas in video images and, in an illustrative application, is used in conjunction with the video coder of video encoding/decoding (Codec) equipment. The skin area detector identifies skin areas in video frames by initially analyzing the shape of all the objects in a video sequence to locate one or more objects that are likely to contain skin areas. Objects that are likely to contain skin areas are further analyzed to determine if the picture elements (pixels) of any such object or objects have signal energies characteristic of skin regions. The term signal energy as used herein refers to the sum of the squares of the luminance (
Okada Hiroyuki
Rosenberg Jonathan David
Coles Edward
Lamb Twyler
Lucent Technologies - Inc.
LandOfFree
Skin area detection for video image systems does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Skin area detection for video image systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Skin area detection for video image systems will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2819991