Data processing: presentation processing of document – operator i – Presentation processing of document – Layout
Reexamination Certificate
1999-12-27
2004-11-30
Feild, Joseph (Department: 2176)
Data processing: presentation processing of document, operator i
Presentation processing of document
Layout
C707S793000
Reexamination Certificate
active
06826724
ABSTRACT:
FIELD OF THE INVENTION
The present invention relates to a document processor for displaying and printing multiple input document data in a predetermined format, a document processing method, and a computer-readable recording medium for recording a program to execute the method on a computer. Furthermore, this invention relates to a document classification device and a document classification method for classifying multiple input document data based on the contents thereof, and particularly for refining classification categories calculated during document classification, and to a computer-readable recording medium for recording a program to execute the method on a computer.
BACKGROUND OF THE INVENTION
Various document classification devices and document retrieval devices have been developed in recent years. The proliferation of network technology, such as the Internet, has made it possible to access a huge amount of electronic documents, domestically and overseas, and there has been a proportionate rapid expansion in the amount of data which is stored electronically. Accordingly, there is an increasing need for intellectual operations such as classifying large collections of document data into meaningful categories.
The benefits of classifying large amounts of document data according to their meaning are as follows. Firstly, it makes it easier to retrieve data. Retrieval becomes relatively easy since vast groups of documents can be retrieved using category names as clues.
Secondly, entire groups of data can be grasped. That is, it is possible to grasp the contents (individual classifications) of an entire cluster of documents. However, when a large amount of document data is classified by an operator, although accurate classification can be achieved, classification requires enormous manpower and time. Consequently, in view of the huge amount of documents stored in recent years, devices for automatically classifying document data have been proposed.
As an example of a conventional device for automatically classifying documents, Japanese Patent Application Laid-open (JP-A) No. 7-36897 discloses a device which defines a document as a document vector characterized by a word, uses clustering to group these document vectors, and automatically classifies the documents based on the grouped document vectors.
Furthermore, in “Projections for Efficient Document Clustering (Authors: Hinrich Schutze and Craing Silverstein, Academy: ACM, Title of Paper: Proceedings of SIGIR, pages: 78-81, Year of Publication: 1997)” documents are classified in dormant meaning space. Other conceivable methods include using a probability theory approach, etc.
Furthermore, in recent years, the proliferation of the Internet and the like has made it possible to access large amounts of document clusters, and as a result, there is an increasing need to be able use these document clusters effectively, and in accordance with the intentions of a variety of users. To accomplish this, an intellectual operation is starting to be used in which a large amount of document clusters is classified into meaningful categories, and the structure of the document clusters is grasped. However, when this type of classification is performed manually, enormous manpower and time are required. Further, since only the classifier knows how to classify the document data, classification standard change when the person responsible for classification is replaced.
Consequently, there is a demand for a document classification device capable of automatically classifying groups of documents according to the same type of classification standards used by humans. For example, as disclosed in Japanese Patent Application Laid-open (JP-A) No. 7-114572, a document classification device capable of automatically extracting a word characteristic vector from a document, and classifying the document based on the characteristic vector, thereby making it possible to automatically classify the documents using meaningful differences.
However, since the conventional document classification device described above uses a method for statistically classifying documents arranged in multi-dimensional space essentially comprising words, the result of the classification is nothing more than the statistically determined behaviour of the words. Consequently, clusters (partial groups of individual classified documents) calculated after classification are sometimes incomprehensible to the operator (user).
A further problem is that the question of what kind of classification is appropriate depends on the characteristics of the document clusterings to be classified and the intentions of the user, making it difficult to define an appropriate classification. In particular, when grasping entire data groups as mentioned above, the type of classification required will differ depending on the widely varying intentions of the operators, and it will be difficult to obtain the result desired by the operator in a single classification.
Thus, the problem can be interpreted by saying that a document classification result includes a great amount of noise, only one part of which is of use to the operator.
Furthermore, the conventional technology does not consider the constitutional units of the document, and in a case where the structure of a document is partitioned by one or multiple period symbols, titles, and the like, multiple topics and meanings are contained in a single document. This results in problems that it is difficult for a user to understand the classification categories, the category may be limited to a specific topic or specific meaning, or the document may be classified under a category different to that intended by the user.
A context-dependent automatic classification device is disclosed in Japanese Patent Application Laid-open (JP-A) No. 6-176064, and aims to increase classification precision by automatically classifying documents in consideration of the conclusive data therein, but essentially does not solve the problems mentioned above.
Furthermore, conventional document processors, such as the document classification device and document retrieve device described above, merely classify or retrieve documents, and give no consideration to further analysis of information hidden in the document clusters. Consequently, they have a disadvantage that a separate analyzing device must be used to analyze information hidden in the document clusters.
Furthermore, the operator who wishes to analyze the information does not perform classification and retrieval as an end in itself, but simply as an intermediate Step during his analysis of the information. After classification and retrieval, in order to grasp the result more easily it is usually necessary to derive a meaningful result from the information analysis by repeating a variety of other processes, such as maximizing the practical usefulness of the information included in the original document, rearranging the result, carrying out totalization and statistical processing, and drawing up charts and graphs based on the results.
Furthermore, table-calculating software is sometimes needed when analyzing information about numerical data. However, table-calculating software was originally developed to handle numerical data, and is not sufficiently effective for analyzing textual data, particularly when the analysis concerns the meaning of documents.
SUMMARY OF THE INVENTION
This invention has been achieved in order to solve the problems of the conventional examples described above. It is a first object of the present invention to provide a document processor, a document processing method, and a computer-readable recording medium storing programs for executing the method on a computer, for carrying out analysis concerning the meaning of documents, not simply by outputting the results of fixed functions such as classification and retrieval, but by supporting a complete range of information analysis.
To solve the problems of the conventional example described above, it is a second object of the present invention to provide a document cl
Kenmochi Eiji
Miyachi Tatsuo
Nagatsuka Tetsuro
Shimada Atsuo
Takeya Kazutoshi
Feild Joseph
Nguyen Maikhanh
Ricoh & Company, Ltd.
LandOfFree
DOCUMENT PROCESSOR, DOCUMENT CLASSIFICATION DEVICE, DOCUMENT... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with DOCUMENT PROCESSOR, DOCUMENT CLASSIFICATION DEVICE, DOCUMENT..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and DOCUMENT PROCESSOR, DOCUMENT CLASSIFICATION DEVICE, DOCUMENT... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3274284