System and method for classifying legal concepts using legal...

Data processing: artificial intelligence – Machine learning

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06502081

ABSTRACT:

COPYRIGHT NOTICE
A portion of this disclosure, including Appendices, is subject to copyright protection. Limited permission is granted to facsimile reproduction of the patent document or patent disclosure as it appears in the U.S. Patent and Trademark Office (PTO) patent file or records, but the copyright owner reserves all other copyright rights whatsoever.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to systems and methods for automated classification. More specifically, the invention relates to automated systems and methods for classifying concepts (such as legal concepts, including points of law from court opinions) according to a topic scheme (such as a hierarchical legal topic classification scheme).
2. Related Art
Document classification has long been recognized as one of the most important tasks in text processing. Classification of documents provides for quality document retrieval, and enables browsing and linking among documents across a collection. The benefits of such easy access are especially apparent in slowly-evolving subject domains such as law. The generally stable vocabularies and topics of the legal domain insure long-term return on any classification work.
There are two broad document classification approaches: unsupervised learning and supervised learning. The approaches are differentiated by whether a pre-defined classification scheme is used.
Unsupervised learning is a data-driven classification approach, based on the assumption that documents can be well organized by a natural structure inherent to the data. Those familiar with the data should be able to follow this natural structure to locate their information. A large body of information retrieval literature has focused on this approach, mostly related to document clustering [Borko 1963, Sparck Jones 1970, van Rijsbergen 1979, Griffiths 1984, Willett 1988, Salton 1990]. More recently some machine learning techniques have been applied to this classification task [Farkas 1993]—the term “unsupervised learning” was coined to describe this approach. The following patents are associated with this approach: U.S. Pat. No. 5,182,708 and U.S. Pat. No. 5,832,470.
Opposite to the unsupervised learning approach to document classification is supervised learning. With this approach, a pre-defined “topic scheme” is given, along with the classified documents for each topic in the scheme. The topic scheme may be a simple list of discrete topics, or a complex hierarchical topic scheme. Supervised learning technology focuses on the task of feeding a computer meaningful topical descriptions so that it can learn to classify a document of unknown type.
When a topic scheme includes a simple list of discrete topics (one without a complex hierarchical relationships among the topics), the document classification becomes mere document categorization. Many machine learning techniques, including the retrieval technique of relevance feedback, have been tried for this task [Buckley 1994, Lewis 1994, and Mitchell 1997]. In addition to the effectiveness of learning methods themselves, the success of automatic categorization depends on the number of topics in the scheme, on the amount of quality training documents, and on the degree that the topics are mutually exclusive to one another. An example is disclosed in U.S. Pat. No. 5,675,710.
The more difficult document classification centers on classifying documents using a hierarchical topic scheme. In this task, one has to consider horizontal relationships among the sister topics, which tend to be close to each other and are thus confusing to a computer. Moreover, one must also be concerned with vertical inheritance relationships.
Many machine learning techniques have trouble accommodating these two semantic relationships simultaneously in their learning or training, and thereafter have difficulty in classifying documents effectively. The task becomes more challenging if the topic scheme is very large, if the training documents are not topically exclusive, if the size of documents is small, or if the documents lack descriptive information.
To face these challenges, some techniques (U.S. Pat. No. 5,204,812) have relied on human intervention. Others (U.S. Pat. No. 5,794,236) use simple but insightful pattern matching. Still others (U.S. Pat. Nos. 5,371,807 and 5,768,580) turn to linguistic knowledge to combat the ambiguity introduced in the hierarchical scheme.
However, these techniques can only handle small, domain-specific classification work. They have difficulty in scaled processing, either because of their simplicity in pattern recognition or because of the daunting demand of building expensive lexicons to support the linguistic parsing.
Thus, there is a need in the art to develop an economic, scalable machine learning process that can perform document classification with high accuracy using a large, hierarchical topic scheme. It is to meet this need that the present invention is directed.
Non-Patent References mentioned above:
Borko, H. and Bernick M. 1963. “Automatic document classification.”
Journal of the Association for Computing Machinery
, pp. 151-161.
Sparck Jones, K. 1970. “Some thoughts on classification for retrieval.”
Journal of Documentation
, pp.89-102.
Van Riusbergen, C. J. 1979
. Information Retrieval
, 2nd edition, Butterworths, London.
Griffiths, A and others. 1984. “Hierarchic agglomerative clustering methods for automatic document classification.”
Journal of Documentation
, pp. 175-205.
Willett, P. 1988. “Recent trends in hierarchic document clustering: A critical review.”
Information Processing and Management
, pp. 577-598.
Salton, G. and Buckley C. 1990. “Flexible text matching for information retrieval.” Technical Report 90-1158, Cornell University, Ithaca, N.Y.
Farkas, J. 1993. “Neural networks and document classification.”
Canadian Conference on Electrical and Computer Engineering
, pp. 1-4.
Buckley, C and others. 1994. “Automatic routing and ad-hoc retrieval using SMART: TREC-2
.” The
2
nd Text Retrieval Conference
, edited by Donna Harman, NIST Special Publication 500-215, pp.45-55.
Lewis, D. D. and Gale, W. A. 1994. “A sequential algorithm for training text classifiers.”
Proceedings of the
7
th Annual International ACM
-
SIGIR Conference on Research and Development in Information Retrieval
, pp.3-12, London.
Mitchell, T. 1997. Machine Learning, McGraw Hill, New York.
SUMMARY OF THE INVENTION
The inventive system and method provide an economic, scalable machine learning process that performs document classification with high accuracy using large topic schemes, including large hierarchical topic schemes. More specifically, the inventive system and method suggest one or more highly relevant classification topics for a given document to be classified.
The invention provides several features, including novel training and concept classification processes. The invention also provides novel methods that may be used as part of the training and/or concept classification processes, including: a method of scoring the relevance of features in training concepts, a method of ranking concepts based on relevance score, and a method of voting on topics associated with an input concept.
In a preferred embodiment, the invention is applied to the legal (case law) domain, classifying legal concepts (such as rules of law) according to a proprietary legal topic classification scheme (a hierarchy of areas of law).


REFERENCES:
Borko, Harold et al., Automatic Document Classification, System Development Corp.; Nov. 1962, pp. 152-162.
Jones, Karen Sparck, Some Thoughts on Classification for Retrieval, University Mathematical Laboratory, Cambridge, MA, The Journal of Documentation, vol. 26, No. 2; Jun. 1970, pp. 89-101.
Griffiths, Alan et al., Hierarchic Agglomerative Clustering Methods for Automatic Document Classification, University of Sheffield, Western Bank, Sheffield, UK, The Journal of Documentation, vol. 40, No. 3; Sep. 1984, pp. 175-205.
Willett, Peter, Recent Trends in Hierarchic Document Clusteri

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

System and method for classifying legal concepts using legal... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with System and method for classifying legal concepts using legal..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for classifying legal concepts using legal... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2918320

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.