Method and apparatus for automatic search for relevant...

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06735577

ABSTRACT:

FIELD OF THE INVENTION
The instant invention relates to information search and retrieval tasks, especially to finding electronically stored images in electronic data bases.
BACKGROUND OF THE INVENTION
In these days of the internet with the bulk of information vastly and rapidly growing in the shortest time span, an ever increasing demand exists for purposive and effective search and retrieval of information held in store in data bases. The problems involved in so-called information retrieval of data which are stored electronically in data bases may be outlined as follows: A given data base comprises n data sets x
1
. . . x
n
(n≧2). The search for pictures and retrieval thereof is carried out by a special method of searching in data bases. The object of the search are data sets x
1
. . . x
n
which embody n pictures in electronic form. What must be found in the data base is a subset D
rel
of relevant data sets (electronic images). This subset D
rel
is the relevant quantity of data sets to answer a specific question by a user. In an example of searching for a picture this might be pictures of a beach at a coast of a Hawaiian island.
When applying known searching methods and device to retrieve pictures from data bases, first, attempts are made to describe the relevant subset D
rel
by catchwords, subsequently the catchwords are drawn upon to make a search request. The user of the data base presents his request in textual form—usually without having knowledge of the full list of catchwords listed in the data base. In the example chosen, the user's query might include the words “beach Hawaii”. The words of this query are compared with catchwords which are stored for the pictures in the data base. Often in these cases the so-called Boolean search method is applied. This method offers the user the opportunity to link the catchwords by AND, OR, and NOT. Some methods and device additionally permit these three operations to be given a respective weighting.
The following difficulties may have to be overcome when searching in a picture data base:
(1) How can the subset D
rel
needed for the search be described systematically with words when the data sets are x
i
(digitized) images?
(2) The data base often comprises a very large number of pictures (n>>100,000) and, therefore, the user cannot review and judge all those n pictures.
Fundamentally, a distinction may be made between two different approaches in the search for pictures. In one case the picture is digitized and features are extracted from the digitized image. That begins with the simplest description, using gray levels or color levels of each pixel (so-called low level features), i.e., for a picture having 1,000×1,000 pixels a total of 1,000,000 different features per picture are extracted. It ends with features referred to as high level features, such as the number of edges and corners or number of surfaces etc. The use of simple features has the advantage of permitting quick calculation. However, it is disadvantageous that such features are not very well suited to describe relevant search quantities of the picture. Although more complex features thus would be much better suited, their extraction at present still involves such great expenditure that it is almost impossible, for practical reasons, to make use of them in connection with data bases containing more than 10,000 pictures.
In another known method a human being provides catchwords to describe a picture, i.e., for each picture a list of catchwords is drawn up which refer to what is represented in the picture. This complex extraction of features has the advantage that it simplifies the characterization of relevant pictures by linking the catchwords. Technically speaking, a picture x is represented by a vector x &egr; {0, 1}
s
(s is the number of all the catchwords possible). If the ith catchword is contained in the list of catchwords pertaining to the picture the ith component x
i
of vector x is 1, otherwise it is 0. Operations, such as conjunction (AND) or disjunction (OR) in this case may be represented by mathematical operations, like multiplication or addition.
Once the search has been started, the picture search machine calculates a system relevance for each of the electronically stored picture data sets x
1
. . . x
n
in respect of the search request. This calculation of the respective system relevance is an essential property of each picture search machine or picture search method. The effectiveness and quality of the calculation of the respective system relevance are of essential importance for the success of the search system. Two approaches, based on differing principles, have become generally accepted with catchword search methods for calculating the system relevance:
If the textual search request comprising only catchwords, as generated by the user, is interpreted as a vector q &egr; {0, 1}
s
the similarity between the textual search request and the respective catchword list of the pictures or picture data sets in the data base can be calculated, based on the lists of catchwords available for the electronically stored pictures. This similarity then may be used as a measure of system relevance. This approach, known as the “vector space model” is described, for instance, by G. Salton in “Automatic Information Organization and Retrieval”, McGraw-Hill, New York, 1968.
With another approach, a probability model is applied to the catchwords in relevant documents (estimated on the basis of the textual search request which contains nothing but catchwords), allowing the probability to be calculated that a picture is comprised by the subset D
rel
, and this probability then may be taken as the measure of system relevance.
On the basis of the system relevances found for all the pictures in the data base, the pictures are put into order in accordance with the system relevances calculated and thus are presented to the user. Many times in practice, it is sufficient to find just the 100 pictures having the highest values of system relevance—a task which can be resolved much more quickly than sorting a huge number of, for instance, 1,000,000 pictures.
If the user of the data base still should not be satisfied with the search result he will have to revert to his query and change the text, for example, by restricting it further. Some systems offer the user a possibility of “feedback” by way of choosing a picture which he thinks is “very similar” or “close” to the relevant documents D
rel
.
Such methods have an essential disadvantage in that search queries based on identical text entries by the user in connection with a certain stock of pictures always will provide the same search result. This device that the users in search of a picture are the ones who must adapt to the catchword system of the data base in order to be able to model the individual preferences and characteristics of the data base because the only possible device of “communication” between the data base user and the search system is the textual search query. As a rule, that requires intensive and time consuming “exploration” of the specifics of the respective data base chosen by the data base user.
SUMMARY OF THE INVENTION
It is the object of the invention to provide an improved method and apparatus for searching for a relevant subset of data sets from a quantity of data sets, especially picture data sets which are stored electronically in a data base and, at the same time, to improve the efficiency and quality of the search as well as its user friendliness.
According to one aspect of the invention a method is provided of automatically searching for relevant picture data sets in a quantity of n (n≧2) picture data sets electronically stored in a memory device, picture attributes for each of the n picture data sets being stored electronically in the memory device, and the n picture data sets as well as the stored picture attributes being adapted to be processed electronically by a processor, said method comprising:
(a) providing a first selection of picture

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for automatic search for relevant... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for automatic search for relevant..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for automatic search for relevant... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3226475

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.