Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
1997-06-03
2001-10-02
Herndon, Heather R. (Department: 2176)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06298357
ABSTRACT:
BACKGROUND OF THE INVENTION
The present invention relates to techniques that identify and categorize paragraphs, subparagraphs, and structural groupings in electronic documents, and more particularly to techniques that build a structure hierarchy from structural groupings.
An electronic document typically has information content, such as text, graphics, and tables, and formatting information that directs how the content is to be displayed. An electronic document resides on a digital, though not necessarily electronic, computer storage medium. An electronic document is generally provided by an author, distributor, or publisher who desires that the document be viewed with the appearance with which it was created. Electronic documents may be widely distributed and, therefore, can be viewed on a great variety of hardware and software platforms. A hypertext document is an electronic document with links, which are explicit, user-selectable navigation elements.
Generally, electronic and human perceptible documents include a set of paragraphs. Each instance of a paragraph shares characteristics with other paragraphs. Paragraphs that share visual characteristics can be considered the same structural type. Examples of structural paragraph types are titles, headers, and footnotes.
In addition, in all documents, paragraphs can have subparagraphs, which are character streams. Each instance of a subparagraph shares similar characteristics with other subparagraphs that are the same structural type. Examples of subparagraph structural types are book titles, quotations, and foreign words and phrases.
A document typically has a logical organization. Within the logical organization are identifiable structural groups. A series of chapters containing paragraphs is an example of a structural group, as is a section that contains a heading, several paragraphs, and a bulleted list.
Organizing components in an electronic document by structural type permits an electronic document development system to perform global operations on all instances of the same type within the electronic document. For example, the FrameMaker® document publishing system, available from Adobe Systems Incorporated of San Jose, Calif. can globally change the justification of all paragraphs tagged as a particular type in the electronic document and can globally change the font size of all characters tagged as a particular type in the electronic document.
Standard type formats exist for particular uses and for particular systems. For example, the HyperText Markup Language (HTML) uses the embedded tags <P> and </P> to delimit paragraphs, and <B> and </B> to delimit bold text. HTML also specifies many other tags including tags for titles, menus, definitions, quoted blocks, and heading styles. For an electronic document to have the desired visual appearance when viewed with a World Wide Web browser, the electronic document must have the appropriate HTML tags.
When viewed on paper or on a computer display, the different structural paragraph types in a document, such as headings and lists, are readily identifiable. However, to enable a system to perform operations based on structural types, such as modifying, rearranging, displaying, or printing a document, will generally require that someone examine and tag all paragraphs and subparagraphs manually according to their visually recognized structural type. This is tedious and time consuming, and often an impracticable process for large documents.
SUMMARY OF THE INVENTION
In accordance with the present invention, a method of extracting structure information from an electronic document includes the step of identifying a structure type for each instance in the electronic document by examining presentation attributes associated with each instance. With such an arrangement, an unstructured electronic document can be provided with structural tags.
Among the advantages of the invention are one or more of the following. The invention enables an electronic document development system to perform global operations on all paragraph and subparagraph instances by structural type. Global operations include, but are not limited to, format changes, searches, word and phrase replacements, and extractions. The invention enables the electronic document development system to perform operations on the structure of the electronic document (e.g., rearrange the hierarchy or subdivide the structure). The invention permits an electronic document to be rearranged or divided according to structural groupings of the document. The invention enables an electronic document to be rearranged based on full sections and identifiable units. The invention enables the document to be split based on logical organization.
Other features and advantages of the invention will become apparent from the following description and from the claims.
REFERENCES:
patent: 5130924 (1992-07-01), Barker et al.
patent: 5146552 (1992-09-01), Cassorla et al.
patent: 5181162 (1993-01-01), Smith et al.
patent: 5434962 (1995-07-01), Kyojima et al.
patent: 5557722 (1996-09-01), DeRose et al.
patent: 5568640 (1996-10-01), Nishiyama et al.
patent: 5634064 (1997-05-01), Warnock et al.
patent: 5694609 (1997-12-01), Murata
patent: 5701500 (1997-12-01), Ikeo et al.
patent: 5708806 (1998-01-01), DeRose et al.
patent: 5737599 (1998-04-01), Rowe et al.
patent: 5781785 (1999-04-01), Rowe et al.
patent: 5946647 (1999-08-01), Miller et al.
Wexler Michael C.
Young Jeffrey E.
Adobe Systems Incorporated
Fish & Richardson P.C.
Herndon Heather R.
Huynh Cong-Lac
LandOfFree
Structure extraction on electronic documents does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Structure extraction on electronic documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Structure extraction on electronic documents will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2602239