Very-large-scale automatic categorizer for web content

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06826576

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The invention relates to the field of data processing. More specifically, the invention relates to the automatic analysis of the content of electronic data objects and the categorization of the electronic data objects into one or more discrete categories.
2. Background Information
The Internet consists of billions of discrete pages, which can be accessed from any browser-equipped computer or appliance connected to the World Wide Web (hereinafter “Web”). The availability of so many pages simultaneously represents both a boon and a bane to the user. Information, opinion, and news are available about a vast array of topics, but the challenge is to find those pages of the Web which are most relevant to the particular needs or desires of the user at any given moment.
A number of search engines are available on the Web for free use. These search engines typically index some fraction of the pages available on the Web, and provide users with the ability to search for information on the Web using keywords or may not know how to correctly formulate a search query to find the most appropriate page(s).
Another method of organizing the Web is the use of categorical hierarchies. Certain companies have analyzed the contents of tens or hundreds of thousands of web pages, placing each page into one or more of the categories in their particular subject hierarchy. Users can then browse such subject hierarchies, or search through them based upon keywords. Such searches provide results annotated with the subject area of the target page, which can assist the user in determining whether the page might be relevant to the actual topic being searched.
FIG. 10
illustrates an exemplary prior art subject hierarchy
1002
in which multiple decision nodes (hereinafter “nodes”)
1030
-
1036
are hierarchically arranged into multiple parent and/or child nodes, each of which are associated with a unique subject category. For example, node
1030
is a parent node to nodes
1031
and
1032
, while nodes
1031
and
1032
are child nodes to node
1030
. Because nodes
1031
and
1032
are both child nodes of the same node (e.g. node
1030
), nodes
1031
and
1032
are said to be siblings of one another. Additional sibling pairs in subject hierarchy
1002
include nodes
1033
and
1034
, as well as nodes
1035
and
1036
. It can be seen from
FIG. 10
that node
1030
forms a first level
1037
of subject hierarchy
1002
, while nodes
1031
-
1032
form a second level
1038
of subject hierarchy
1002
, and nodes
1033
-
1036
form a third level
1039
of subject hierarchy
1002
. Additionally, node
1030
is referred to as a root node of subject hierarchy
1002
in that it is not a child of any other node.
In general, search hierarchies are filled with pages by manual classification of individual web pages using the talents of experts in particular subject fields. This method has several problems, including the cost of finding experts to perform the classification, and the necessary backlog between the time a site is placed on the Web and the time (if ever) it enters the classification hierarchy, moreover a grader expert in one subject area may misclassify a page of another subject, which can make the page more difficult to find for the casual browser.
Although this is an active area of research, existing systems typically work with only a limited number of subject fields and often display poor performance. Therefore, what is desired is an automatic system for classifying a large number of documents quickly and effectively into a large subject hierarchy.


REFERENCES:
patent: 5418951 (1995-05-01), Damashek
patent: 5428778 (1995-06-01), Brookes
patent: 5461698 (1995-10-01), Schwanke et al.
patent: 5537586 (1996-07-01), Amram et al.
patent: 5576954 (1996-11-01), Driscoll
patent: 5640468 (1997-06-01), Hsu
patent: 5652829 (1997-07-01), Hong
patent: 5657424 (1997-08-01), Farrell et al.
patent: 5706507 (1998-01-01), Schloss
patent: 5752051 (1998-05-01), Cohen
patent: 5787420 (1998-07-01), Tukey et al.
patent: 5794236 (1998-08-01), Mehrle
patent: 5809499 (1998-09-01), Wong et al.
patent: 5832470 (1998-11-01), Morita et al.
patent: 5835905 (1998-11-01), Pirolli et al.
patent: 5867799 (1999-02-01), Lang et al.
patent: 5870744 (1999-02-01), Sprague
patent: 5909680 (1999-06-01), Hull
patent: 5911043 (1999-06-01), Duffy et al.
patent: 5943670 (1999-08-01), Prager
patent: 6003029 (1999-12-01), Agrawal et al.
patent: 6058205 (2000-05-01), Bahl et al.
patent: 6128613 (2000-10-01), Wong et al.
patent: 6161130 (2000-12-01), Horvitz et al.
patent: 6163778 (2000-12-01), Fogg et al.
patent: 6233575 (2001-05-01), Agrawal et al.
patent: 6249785 (2001-06-01), Paepke
patent: 6252988 (2001-06-01), Ho
patent: 6285999 (2001-09-01), Page
patent: 6430558 (2002-08-01), Delano
patent: 6473753 (2002-10-01), Katariya et al.
patent: 6507843 (2003-01-01), Dong et al.
patent: 6519580 (2003-02-01), Johnson et al.
patent: 6604114 (2003-08-01), Toong et al.
patent: 2001/0032029 (2001-10-01), Kauffman
patent: 2001/0042085 (2001-11-01), Peairs et al.
patent: 2002/0099730 (2002-07-01), Brown et al.
patent: 2002/0152222 (2002-10-01), Holbrook

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Very-large-scale automatic categorizer for web content does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Very-large-scale automatic categorizer for web content, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Very-large-scale automatic categorizer for web content will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3280193

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.