Information retrieval from hierarchical compound documents

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Information retrieval from hierarchical compound documents Information retrieval from hierarchical compound documents

: 1999-09-28
: 2003-04-22
: Shah, Sanjiv (Department: 2172)
: Data processing: database and file management or data structures
: Database design
: Data structure types

: C707S793000
: Reexamination Certificate
: active
: 06553364
: ABSTRACT:

COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the xerographic reproduction by anyone of the patent document or the patent disclosure in exactly the form it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
The present invention relates to the field of electronic document storage and management. More specifically, one embodiment of the invention provides for a system of storing compound documents and searching the stored compound documents.
Information has recently undergone a transition from a scarce commodity to an overabundant commodity. With a scarce commodity, efforts are centered on acquiring the commodity, whereas with an overabundant commodity, efforts are centered on filtering the commodity to make it more valuable. The prime example of this phenomenon is the explosion of information resulting from the growth of the global internetwork of networks known as the “Internet.” Networks and computers connected to the Internet pass data using the TCP/IP (Transport Control Protocol/Internet Protocol) for reliably passing data packets from a source node to a destination node. A variety of higher level protocols are used on top of TCP/IP to transport objects of digital data, the particular protocol depending on the nature of the objects. For example, e-mail is transported using the Simple Mail Transport Protocol (SMTP) and the Post Office Protocol 3 (POP3), while files are transported using the File Transfer Protocol (FTP). Hypertext documents and their associated effects are transported using the Hypertext Transport Protocol (HTTP).
When many hypertext documents are linked to other hypertext documents, they collectively form a “web” structure, which led to the name “World Wide Web” (often shortened to “WWW” or “the Web”) for the collection of hypertext documents that can be transported using HTTP. Of course, hyperlinks are not required in a document for it to be transported using HTTP. In fact, any object can be transported using HTTP, so long as it conforms to the requirements of HTTP.
In a typical use of HTTP, a browser sends a uniform resource locator (URL) to a Web server and the Web server returns a Hypertext Markup Language (HTML) document for the browser to display. The browser is one example of an HTTP client and is so named because it displays the returned hypertext document and allows the user an opportunity to select and display other hypertext documents referenced in the returned document. The Web server is an Internet node which returns hypertext documents requested by HTTP clients.
Some Web servers, in addition to serving static documents, can return dynamic documents. A static document is a document which exists on a Web server before a request for the document is made and for which the Web server merely sends out the static document upon request. A static page URL is typically in the form of “host.subdomain.domain.TLD/path/file” or the like. That static page URL refers to a document named “file” which is found on the path “/path/” on the machine which has the domain name “host.subdomain.domain.TLD”. An actual domain “www.yahoo.com”, refers to the machine (or machines) designated “www” at the domain “yahoo” in the “.com” top-level domain (TLD). By contrast, a dynamic document is a document which is generated by the Web server when it receives a particular URL which the server identifies as a request for a dynamic document.
Many Web servers operate “Web sites” which offer a collection of linked hypertext documents controlled by a single person or entity. Since the Web site is controlled by a single person or entity, the hypertext documents, often called “Web pages” in this context, have a consistent look and subject matter. Especially in the case of Web sites put up by commercial interests selling goods and services, the hyperlinked documents which form a Web site will have few, if any, links to pages not controlled by the interest. The terms “Web site” and “Web page” are often used interchangeably, but herein a “Web page” refers to a single hypertext document which forms part of a Web site and “Web site” refers to a collection of one or more Web pages which are controlled (i.e., modifiable) by a single entity or group of entities working in concert to present a site on a particular topic.
With all the many sites and pages that the many millions of Internet users might make available through their Web servers, it is often difficult to find a particular page or determine where to find information on a particular topic. There is no “official” listing of what is available, because anyone can place anything on their Web server and need not report it to an official agency and the Web changes so quickly. In the absence of an official “table of contents”, several approaches to indexing the Web have been proposed.
One approach is to index all of the Web documents found everywhere. While this approach is useful to find a document on a rarely discussed topic or a reference to a person with an uncommon first or last name, it often leads to excessive numbers of “hits.” Another approach is to summarize and categorize web documents and make the summaries searchable by category.
In either case, a typical search engine searches for search terms in each candidate document and returns a list of the documents which meet the search criteria. Unfortunately, the information to be gained from the interrelationships of documents is lost. From the above it is seen that an improved search system which takes into account the interrelationships between documents is needed.
SUMMARY OF THE INVENTION
An improved search system which takes into account interrelationships among documents by searching across links is provided by virtue of the present invention. In one embodiment of the present invention, the documents are references in a hierarchical document repository used for keyword and topical searches. A search query is applied to the hierarchy, which returns documents which directly match a search query term or indirectly match the search query term by being a child document in the hierarchy from a parent document matching all or part of the query term. In a preferred embodiment, a returned document matches at least one subterm of the query term directly.
One advantage of the present invention is that it provides for efficient storage of hierarchical data while allowing searches to be performed taking into account relationships among data elements in a hierarchy.
A further understanding of the nature and advantages of the inventions herein may be realized by reference to the remaining portions of the specification and the attached drawings.

REFERENCES:
patent: 5546571 (1996-08-01), Shan et al.
patent: 5628009 (1997-05-01), Kikuta et al.
patent: 5694594 (1997-12-01), Chang
patent: 5764973 (1998-06-01), Lunceford
patent: 5787417 (1998-07-01), Hargrove
patent: 5787430 (1998-07-01), Doeringer et al.
patent: 5812134 (1998-09-01), Pooser et al.
patent: 5819258 (1998-10-01), Vaithyanathan et al.
patent: 5835905 (1998-11-01), Pirolli et al.
patent: 5855013 (1998-12-01), Fisk
patent: 5978799 (1999-11-01), Hirsh
Sullivan et al., “Supercharge Your Web Searched”, NetGuide, Issue 405, pp. 1-8 (May 1997).
“Tips on Popular Search Engines”, http://www.hamline.edu/library/bush/handouts/slhandout.html, pp. 1-5 (1997).
“Yahoo!—Frequently Asked Questions”, http://www.yahoo.com/docs/info/faq.html#whatis.
Sahami et al., “SONIA: A Service for Organizing Networked Information Automatically”, Digital Libraries, 98, pp. 200-209 (Mar. 1998).
Lim et al., “Querying Structured Web Resources”, Digital Libraries 98, pp. 297-298 (Mar. 1998).
Balasubramanian et al., “A Large-Scale Hypermedia Application using Document Management and Web Technologies”, Hypertext 97 (1997).
Mukherjea et al., “Focus and Context Views of World-Wide Web Nodes”, Hypertext 97 (1997).
Kustron, “Searching the World Wid

Affiliated with

Wu Jiong

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Shah Sanjiv

Examiner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Townsend & Townsend & Crew LLP

Law Firm

[ 0.00 ] – not rated yet Voters 0 Comments 0

Yahoo&excl; Inc.

Corporate Assignee

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Information retrieval from hierarchical compound documents does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Information retrieval from hierarchical compound documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information retrieval from hierarchical compound documents will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFUS-PAI-O-3057400

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure