Image analysis – Image segmentation – Segmenting individual characters or words
Patent
1995-12-14
1998-12-15
Coles, Edward L.
Image analysis
Image segmentation
Segmenting individual characters or words
382179, 382203, G06K 934
Patent
active
058504761
ABSTRACT:
A method of automatically identifying drop words in a document image without performing character recognition to generate an ASCII representation of the document text. First, the document image is analyzed to identify word equivalence classes, each of which represents at least one word of the multiplicity of words included in the document. Second, for each word equivalence class, the likelihood that it is not a drop word is determined. Third, document length is analyzed to determine whether the document is short. For a short document, the number of word equivalence classes identified as drop words based upon their likelihood is proportional to document length. For long documents, a fixed number of word equivalence classes are identified as drop words based upon the likelihood that they are not drop words.
REFERENCES:
patent: 3930237 (1975-12-01), Villers
patent: 4194221 (1980-03-01), Stoffel
patent: 4610025 (1986-09-01), Blum et al.
patent: 4741045 (1988-04-01), Denning
patent: 4907283 (1990-03-01), Tanaka et al.
patent: 4965763 (1990-10-01), Zamora
patent: 5077668 (1991-12-01), Doi
patent: 5131049 (1992-07-01), Bloomberg et al.
patent: 5181255 (1993-01-01), Bloomberg
patent: 5202933 (1993-04-01), Bloomberg
patent: 5257186 (1993-10-01), Ukita et al.
patent: 5297027 (1994-03-01), Morimoto et al.
patent: 5315671 (1994-05-01), Higuchi
patent: 5321770 (1994-06-01), Huttenlocher et al.
patent: 5325444 (1994-06-01), Cass et al.
patent: 5384864 (1995-01-01), Spitz
patent: 5390259 (1995-02-01), Withgott et al.
patent: 5396566 (1995-03-01), Bruce et al.
patent: 5410611 (1995-04-01), Huttenlocher et al.
patent: 5410612 (1995-04-01), Arai et al.
patent: 5442715 (1995-08-01), Gaborski et al.
patent: 5444797 (1995-08-01), Spitz
patent: 5488719 (1996-01-01), Kaplan et al.
patent: 5491760 (1996-02-01), Withgott et al.
patent: 5495349 (1996-02-01), Ikeda
patent: 5526443 (1996-06-01), Nakayama
patent: 5544259 (1996-08-01), McCubbrey
patent: 5550934 (1996-08-01), Van Vliembergen et al.
patent: 5638543 (1997-06-01), Pedersen et al.
Bloomberg, Dan S. and Luc Vincent. "Blur Hit-Miss Transform and Its Use in Document Image Pattern Detection," Proceedings SPIE Conference 2422, Document Recognition II, San Jose, CA, Feb. 6-7, 1995, pp. 278-292.
Bloomberg, Dan S. et al. "Measuring Document Image Skew and Orientation," Proceedings SPIE Conference 2422, Document Recognition II, San Jose, CA, Feb. 6-7, 1995, pp. 302-316.
Bloomberg, Dan S. "Multiresolution Morphological Analysis of Document Images," Proceedings SPIE Conference 1818, Visual Communications and Image Processing '92, Boston, MA, Nov. 18-20, 1992, pp. 648-662.
Chen, Francine R. et al. "Spotting Phrases in Lines of Imaged Text," Proceedings SPIE Conference 2422, Document Recognition II, San Jose, CA, Feb. 6-7, 1995, pp. 256-269.
Chen, Francine R. and Margaret Withgott. "The Use of Emphasis to Automatically Summarize a Spoken Discourse," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, San Francisco, CA, Mar. 23-26, 1992, pp. 229-232.
Cheong, Tong L. and Tan S. Lip. "A Statistical Approach to Automatic Text Extraction," Institute of Systems Science; Asian Library Journal, pp. 1-8.
Jones, Karen S. and Brigitte Endres-Niggemeyer. "Automatic Summarizing," Information Procesing & Management, vol. 31, No. 5, pp. 625-630, 1995
Jones, Karen S. "What Might Be in a Summary?," Information Retreiveal 93: Von der Modellierung zur Anwendung' (ed. Knorz, Krause and Womser-Hacker), Universitatsverlag Konstanz, 1993, 9-26.
Jones, Richard L. "AIDA the Artificially Intelligent Document Analyzer," McDonald, C., Weckert, J. ed.,. Proceedings of a Conference and Workshop on Libraries and Expert Systems, Riverina, Austrailia, Jul. 1990, pp. 49-57.
Jones, Richard L. and Dan Corbett. "Automatic Document Content Analysis: The AIDA Project," Library Hi Tech, vol. 10:1-2(1992), issue 37-38, pp. 111-117.
Kupiec, Julian et al. "A Trainable Document Summarizer," Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, WA, Jul. 9-13, 1995, pp. 68-73.
Luhn, H. P. "The Automatic Creation of Literature Abstracts," IBM Journal of Research and Development, vol. 2: No. 2, Apr., 1958, pp. 159-165.
Luhn, H. P. "A Business Intelligence System," IBM Journal of Research and Development, vol. 2: No. 4,. Oct. 1958, pp. 314-319.
Paice, Chris D. "Constructing Literature Abstracts by Computer: Techniques and Prospects," Information Processing & Management, vol. 26, No. 1, pp. 171-186, 1990.
Paice, Chris D.and Paul A. Jones. "The Identification of Important Concepts in Highly Structured Technical Papers," Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Pittburgh, PA, Jun. 27-Jul. 1, 1993, pp. 69-78.
Rath, G. J. et al. "The Formation of Abstracts by the Selection of Sentences: Part I. Sentence Selection by Men and Machines," American Documentation, Apr., 1961, pp. 139-143.
Salton, Gerard et al. "Automatic Analysis, Theme Genreation, and Summarization of Machine-Readable Texts," Science, vol. 264, Jun. 3, 1994, pp. 1421-1426.
Chen Francine R.
Tukey John W.
Coles Edward L.
Hurt Tracy L.
Lee Cheukfan
Xerox Corporation
LandOfFree
Automatic method of identifying drop words in a document image w does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Automatic method of identifying drop words in a document image w, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic method of identifying drop words in a document image w will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-1463597