Computer method and apparatus for determining site type of a...

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C707S793000, C706S031000, C706S061000, C709S203000, C709S219000

Reexamination Certificate

active

06778986

ABSTRACT:

BACKGROUND OF THE INVENTION
Generally speaking a global computer network, e.g., the Internet, is formed of a plurality of computers coupled to a communication line for communicating with each other. Each computer is referred to as a network node. Some nodes serve as information bearing sites while other nodes provide connectivity between end users and the information bearing sites.
The explosive growth of the Internet makes it an essential component of every business, organization and institution strategy, and leads to massive amounts of information being placed in the public domain for people to read and explore. The type of information available ranges from information about companies and their products, services, activities, people and partners, to information about conferences, seminars, and exhibitions, to news sites, to information about universities, schools, colleges, museums and hospitals, to information about government organizations, their purpose, activities and people. The Internet has become the venue of choice for every organization for providing pertinent, detailed and timely information about themselves, their cause, services and activities.
The Internet essentially is the network infrastructure that connects geographically dispersed computer systems. Every such computer system may contain publicly available (shareable) data that are available to users connected to this network. However, until the early 1990's there was no uniform way or standard conventions for accessing this data. The users had to use a variety of techniques to connect to remote computers (e.g. telnet, ftp, etc) using passwords that were usually site-specific, and they had to know the exact directory and file name that contained the information they were looking for.
The World Wide Web (WWW or simply Web) was created in an effort to simplify and facilitate access to publicly available information from computer systems connected to the Internet. A set of conventions and standards were developed that enabled users to access every Web site (computer system connected to the Web) in the same uniform way, without the need to use special passwords or techniques. In addition, Web browsers became available that let users navigate easily through Web sites by simply clicking hyperlinks (words or sentences connected to some Web resource).
Today the Web contains more than one billion pages that are interconnected with each other and reside in computers all over the world (thus the term “World Wide Web”). The sheer size and explosive growth of the Web has created the need for tools and methods that can automatically search, index, access, extract and recombine information and knowledge that is publicly available from Web resources.
As used herein, the following terms have the indicated definitions.
Web Domain
Web domain is an Internet address that provides connection to a Web server (a computer system connected to the Internet that allows remote access to some of its contents).
URL
URL stands for Uniform Resource Locator. Generally, URLs have three parts: the first part describes the protocol used to access the content pointed to by the URL, the second contains the domain directory in which the content is located, and the third contains the file that stores the content:
<protocol>: <domain><directory><file>
For example:
http://www.corex.com/bios.html
http://www.cardscan.com/index.html
http://fn.cnn.com/archives/may99/pr37.html ftp://shiva.lin.com/soft/words.zip
Commonly, the <protocol> part may be missing. In that case, modem Web browsers access the URL as if the http:// prefix was used. In addition, the <file> part may be missing. In that case, the convention calls for the file “index.html” to be fetched.
For example, the following are legal variations of the previous example URLs:
www.corex.com/bios.html
www.cardscan.com
fn.cnn.com/archives/may99/pr37.html
ftp://shiva.lin.com/soft/words.zip
20 Web Page
Web page is the content associated with a URL. In its simplest form, this content is static text, which is stored into a text file indicated by the URL. However, very often the content contains multi-media elements (e.g. images, audio, video, etc) as well as non-static text or other elements (e.g. news tickers, frames, scripts, streaming graphics, etc). Very often, more than one file forms a Web page, however, there is only one file that is associated with the URL and which initiates or guides the Web page generation.
Web Browser
Web browser is a software program that allows users to access the content stored in Web sites. Modem Web browsers can also create content “on the fly”, according to instructions received from a Web site. This concept is commonly referred to as “dynamic page generation”. In addition, browsers can commonly send information back to the Web site, thus enabling two-way communication of the user and the Web site.
There are many different types of Web sites, based on the type of content they publish, their purpose, or the type of owner (e.g. company, government, educational institution, etc). Identifying the type of a Web site is important for computer programs that traverse, index, or extract information from Web sites (e.g. search engines, Web data mining applications, etc). When the site type is known, these programs can selectively visit only the “useful” parts of the site, while skipping other parts, or even the whole site (e.g. Internet robots that search for company or people information may skip completely porn sites). In addition, the type of Web site is necessary for estimating the frequency of changes in its content, e.g. news sites may change their content daily, whereas organization sites less frequently, and personal sites (owned by individuals) even less frequently. Internet robots can implement appropriate schedules for visiting a site based on this estimate.
Furthermore, identifying the site type is very helpful in deducing the structure of the site. Broad categories of sites share the same meta-structure, for example, company sites usually have the following sections:
“About” section, with general information and description of the company
“Contact” section, with contact information
“Products/Services” section
“News” section, with press releases and news articles relevant to the company
“Employment opportunities” section, with a list of current job openings in the company
whereas news sites usually include the following sections:
Current news
Local news
World news
Archives (archived news)
Business section (with business news)
Technology section (with technology news)
When the site type is identified, then this general meta-structure provides the blueprint for the expected actual site structure. This blueprint is a significant aid to Web software robots and data extraction tools that visit and extract information from Web sites.
SUMMARY OF THE INVENTION
The purpose of this invention is to automatically classify a Web site into an appropriate type. The potential types may vary, depending on the purpose of the classification. For example, when the purpose of classification is to determine visiting frequency for an Internet robot, then the set of potential types will be based on how frequent the site changes its contents, and may be the following:
{Daily, Weekly, Monthly, Bimonthly, Quarterly, Semiannually, Annually}
On the other hand, if the purpose of classification is to guide Internet robots into visiting certain sections of the site while avoiding others, then the set of potential site types may include the following:
{Company, News, Portal, Government, Hospital, University, Military, Personal}
This invention describes the general mechanism for classifying among any given set of potential types.
Examples of applications that benefit directly from automatic Web site classification are Inventions 5 and 6 as disclosed in the related Provisional Application No. 60/221,750 filed on Jul. 31, 2000 for a “Computer Database Method and Apparatus”.
A preferred embodiment is a software program formed of a preparation p

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Computer method and apparatus for determining site type of a... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Computer method and apparatus for determining site type of a..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer method and apparatus for determining site type of a... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3323338

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.