Image analysis – Image compression or coding
Reexamination Certificate
1999-10-19
2003-12-30
Do, Anh Hong (Department: 2624)
Image analysis
Image compression or coding
C382S239000, C382S248000, C382S251000
Reexamination Certificate
active
06671407
ABSTRACT:
TECHNICAL FIELD
This invention relates to systems and methods for hashing digital bit streams such as digital images. This invention further relates to database systems and methods that utilize the hashing techniques for indexing bit streams and protecting copyrights in the bit streams.
BACKGROUND
Digital images offer many advantages over conventional media in terms of image quality and ease of transmission. However, digital images consume large amounts of memory space. With the ever increasing popularity of the Internet, digital images have become a mainstay ingredient of the Web experience, buoyed by such advances as the increasing speed at which data is carried over the Internet and improvements in browser technology for rendering such images. Everyday, numerous digital images are added to Web sites around the world.
As image databases grow, the needs for indexing them and protecting copyrights in the images are becoming increasingly important. The next generation of database management software will need to accommodate solutions for fast and efficient indexing of digital images and protection of copyrights in those digital images.
A hash function is one probable solution to the image indexing and copyright protection problem. Hash functions are used in many areas such as database management, querying, cryptography, and many other fields involving large amounts of raw data. A hash function maps large unstructured raw data into relatively short, structured identifiers (the identifiers are also referred to as “hash values” or simply “hash”). By introducing structure and order into raw data, the hash function drastically reduces the size of the raw data into short identifiers. It simplifies many data management issues and reduces the computational resources needed for accessing large databases.
Thus, one property of a good hash function is the ability to produce small-size hash values. Searching and sorting can be done much more efficiently on smaller identifiers as compared to the large raw data. For example, smaller identifiers can be more easily sorted and searched using standard methods. Thus, hashing generally yields greater benefits when smaller hash values are used.
Unfortunately, there is a point at which hash values become too small and begin to lose the desirable quality of uniquely representing a large mass of data items. That is, as the size of hash values decreases, it is increasingly likely that more than one distinct raw data can be mapped into the same hash value, an occurrence referred to as “collision”. Mathematically, for A alphabets of each hash digit and a hash value length l, an upper bound of all possible hash values is A
1
. If the number of distinct raw data are larger than this upper bound, collision will occur.
Accordingly, another property of a good hash function is to minimize the probability of collision. However, if considerable gain in the length of the hash values can be achieved, it is sometimes justified to tolerate collision. The length of the hash value is thus a trade off with probability of collision. A good hash function should minimize both the probability of collision and the length of the hash values. This is a concern for design of both hash functions in compilers and message authentication codes (MACs) in cryptographic applications.
Good hash functions have long existed for many kinds of digital data. These functions have good characteristics and are well understood. The idea of a hash function for image database management is very useful and potentially can be used in identifying images for data retrieval and copyrights protection. Unfortunately, while there are many good existing functions, digital images present a unique set of challenges not experienced in other digital data, primarily due to the unique fact that images are subject to evaluation by human observers. A slight cropping or shifting of an image does not make much difference to the human eye, but such changes appear very differently in the digital domain. Thus, when using conventional hashing functions, a shifted version of an image generates a very different hash value as compared to that of the original image, even though the images are essentially identical in appearance. Another example is the deletion of one line from an image. Most people will not recognize this deletion in the image itself, yet the digital data is altered significantly if viewed in the data domain.
Human eyes are rather tolerant of certain changes in images. For instance, human eyes are much less sensitive to high frequency components of an image than low frequency components. In addition, the average (i.e., DC component) is interpreted by our eyes as brightness of an image and it can be changed within a range and cause only minimal visible difference to the observer. Our eyes would also be unable to catch small geometric deformation in most images.
Many of these characteristics of the human visual system can be used advantageously in the delivery and presentation of digital images. For instance, such characteristics enable compression schemes, like JPEG, to compress images with good results, even though some of the image data may be lost or go unused. There are many image restoration/enhancement algorithms available today that are specially tuned to the human visual system. Commercial photo editing systems often include such algorithms.
At the same time, these characteristics of the human visual system can be exploited for illegal or unscrupulous purposes. For example, a pirate may use advanced image processing techniques to remove copyright notices or embedded watermarks from an image without visually altering the image. Such malicious changes to the image are referred to as “attacks”, and result in changes at the data domain. Unfortunately, the user is unable to perceive these changes, allowing the pirate to successfully distribute unauthorized copies in an unlawful manner. Traditional hash functions are of little help because the original image and pirated copy hash to very different hash values, even though the images appear the same.
Accordingly, there is a need for a hash function for digital images that allows slight changes to the image which are tolerable or undetectable to the human eye, yet do not result in a different hash value. For an image hash function to be useful, it should accommodate the characteristics of the human visual system and withstand various image manipulation processes common to today's digital image processing. A good image hash function should generate the same unique identifier even though some forms of attacks have been done to the original image, given that the altered image is reasonably similar to a human observer when comparing with the original image. However, if the modified image is visually different or the attacks cause irritation to the observers, the hash function should recognize such degree of changes and produce a different hash value from the original image.
SUMMARY
This invention concerns a system and method for hashing digital images in a way that allows modest changes to an image, which may or may not be detectable to the human eye, yet does not result in different hash values for the original and modified images.
According to one implementation, a system stores original images in a database. An image hashing unit hashes individual images to produce hash values that uniquely represent the images. The image hashing unit implements a hashing function H, which takes an image I and an optional secret random string as input, and outputs a hash value X according to the following properties:
1. For any image I
i
, the hash of the image, H(I
i
), is approximately random among binary strings of equal length.
2. For two distinct images, I
1
and I
2
the hash value of the first image, H(I
1
), is approximately independent of the hash value of the second image, H(I
2
), in that given H(I
1
), one cannot predict H(I
2
) without knowing a secret key used to produce H(I
1
).
3. If two images I
2
and I
2
are visually the same or similar, the hash v
Koon Say-Ming William
Venkatesan Ramarathnam
Do Anh Hong
Lee & Hayes PLLC
Microsoft Corporation
LandOfFree
System and method for hashing digital images does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for hashing digital images, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for hashing digital images will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3180096