Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-01-24
2002-05-28
Amsbury, Wayne (Department: 2171)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000
Reexamination Certificate
active
06397219
ABSTRACT:
FIELD OF INVENTION
This invention relates to network based classified information systems, to methods of automatically building searchable databases of classified information derived from web pages posted on a network, and, to web pages for use in such systems and methods.
The information systems and databases of most relevance to this invention are those which include classified product and service catalogues similar to the Yellow Pages telephone books, contact indexes similar to the White Pages telephone books, and/or subject indexes similar to Library catalogues. Such information systems and databases typically include sets of associated classification, contact and/or geographic items of information. For convenience, classification, contact and/or geographic information will be hereinafter called CCG-data.
The networks with which this invention is concerned are the worldwide public computer/communications network commonly known as the Internet and private networks—sometimes called intranets—which allow common access to markup documents on computers connected to the network. Markup documents are text files prepared using various markup languages such as HyperText Markup Language (HTML) and Extensible Markup Language (XML) which are implementations (or dialects) of the Standard Generalised Markup Language (SGML). The system of accessible files on the Internet is called the World Wide Web (WWW) and the markup documents themselves are commonly called ‘web pages’. A web page is said to be ‘posted’ on a network when it is stored on computer-readable media of a host network computer as a file which is generally accessible to network users. A web page is transported from the host computer to a requesting computer through intermediate network computers as a computer-readable signal embodied in a carrier wave. Though this invention is not limited to Internet based information systems, these terms are used for convenience.
BACKGROUND TO THE INVENTION
It has been estimated that there are about 100 million web pages on the Internet and that the number is doubling every two years. Many of these pages include information concerning commercially offered goods and services and often include contact details. But the difficulty of locating such information is increasing faster than the growth in the number of web pages.
To assist network users locate web pages of interest, certain network service providers create indexes (or databases) of the contents of web pages posted (stored on computer readable media so as to be generally accessible) on the network and provide ‘search engines’ to use the indexes. These indexes are often created automatically by the use of ‘web crawlers’ which (i) interrogate computer after computer on the network to locate successive web pages and (ii) index the words in each web page encountered against the network address (eg Internet Protocol Address or IPA) and filing system path or universal resource locator (URL) at which the web page is accessible. Hereinafter the terms URL and URI (Uniform Resource Identifier) are taken to be identical in meaning and to signify network addresses and filing system paths. Usually, the indexes consist of a list of unique words with each word having an associated list of URLs of the web pages wherein the word was found to occur during interrogation. The URL serves as a ‘hyperlink’ which, if selected by a user/searcher, results in the associated web page being automatically transmitted from the computer where it is posted on the network to the user/searcher's computer where it may be displayed or otherwise processed. The sending and receiving of files in this way is greatly assisted by user interface programs called ‘web browsers’ (or more simply, ‘browsers’) such as Netscape and Microsoft Internet Explorer.
The search for web pages of interest using search engines leaves much to be desired:
simple searches (those using a few keywords in simple combinations) often yield far too many web page references (URLs) to permit them to be interrogated one-by-one,
complex searches (those using many keywords and/or complex Boolean expressions) require considerable expertise to undertake,
even using optimum search criteria, many irrelevant web pages are referenced because of inconsistent use of terminology by those who author the original web pages,
even using optimum search criteria, many relevant pages are missed, again because of inconsistent use of terminology by web page authors, and
because items of information included in the body of web pages cannot be ‘understood’ or associated in useful ways by web crawlers; that is recognised as, say, a surname, a street name, a geographic locality, or type of goods or services and, say, a surname strongly associated with a street name, a geographic locality, or a type of goods or service.
The result is that information provided by search engines from databases which are automatically compiled using web crawlers is a very poor equivalent of the common Yellow Pages and White Pages directories which serve the telephone industry (though these directories are not, of course, automatically compiled from web pages).
In an attempt to improve the usefulness of automatically compiled network databases, some search engine providers make use of information contained in URLs, such as the country code and top level domain name codes such as ‘com’, ‘edu’, ‘net’ and ‘org’ which is sometimes used to signify the subject matter of web pages. It has been proposed to add more content classifying codes to URLs (eg, “chem” to signify chemical subject matter) to allow specialised databases—national, commercial, chemical, etc—to be generated. However, this proposal has serious drawbacks:
URLs are Internet addresses and it is in principle undesirable to confuse the address function of a URL with that of representing a list of web page classifications or contact details.
A URL is an inappropriate container of multiple web page classification codes and contact details because the length of the URL would cause it to become unwieldy as an Internet address.
Including in a URL classification codes drawn from a list of thousands of codes would compromise the mnemonic quality of Internet addresses such as “www.yellowpages.com”.
There is substantial overlap in the subject matter contained in web pages having the various top level domain name codes.
There is no consensus on, or standard for, content classification codes in URLs.
Another proposal to add content classification data to web pages has arisen from the wish to identify pages containing material that may be offensive to some viewers, or should not be accessed by minors. The Platform for Internet Content Selection (PICS) (see www.w3.org/pub/WWW/PICS and other documents at www.w3.org) is a web page ratings standard similar in principle to the ratings systems for motion pictures. This system allows page authors to “internally” self classify their pages through use of the “<meta . . . >” HTML element. Alternatively, “external” PICS ratings of web pages may be obtained from ratings service providers accessed each time a URL is selected. In practice, the ratings service providers have adopted very limited range of web page classifications. For example, Ararat Software's Commercial Rating System (see www.ararat.com.ratings/ararat10.html.) provides just 5 categories of web page content; commercial content, technical/customer support, ordering information, downloading information and contact information. In other examples, CyberPatrol (www.microsys.com/pics/pics_msi.htm) provides 16 categories, the Recreational Software Advisory Council (www.rsac.org/faq.html) provides 4 categories, SafeSurf (www.safesurf.com/ssplan.htm) provides 11 categories and Vancouver Webpages Rating Service (vancouver-webpages.com/VWP1.0/ provides 11 categories. None of the categories provide classification of web pages by industry, service, product or subject with sufficient specificity to be useful when searching for web pages. Rather, the categories are intended to prevent web browsers from di
Amsbury Wayne
Cermak Shelly Guest
Mills Dudley John
Shanks & Herbert
LandOfFree
Network based classified information systems does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Network based classified information systems, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Network based classified information systems will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2841878