Data processing: presentation processing of document – operator i – Presentation processing of document – Edit – composition – or storage control
Reexamination Certificate
2005-09-02
2010-11-02
Ries, Laurie (Department: 2176)
Data processing: presentation processing of document, operator i
Presentation processing of document
Edit, composition, or storage control
C715S245000
Reexamination Certificate
active
07827484
ABSTRACT:
To correct at least one extraneous or missing space in a document, weights are assigned to tokens contained in a dictionary. Each token is defined by an ordered sequence of non-space symbols. The weights are assigned based on at least one of a token length and frequency of occurrence of the token in the document. Corrected text is generated from text of the document by applying an ordered sequence of symbol-level transformations selected from a group of symbol-level transformations including at least (i) deleting a space, (ii) inserting a space, and (iii) copying a symbol. The ordered sequence of symbol-level transformations is optimized respective to an objective function dependent upon the weights of tokens of the corrected text.
REFERENCES:
patent: 5572423 (1996-11-01), Church
patent: 5715469 (1998-02-01), Arning
patent: 5933525 (1999-08-01), Makhoul et al.
patent: 6167369 (2000-12-01), Schulze
patent: 6618697 (2003-09-01), Kantrowitz et al.
patent: 2003/0216913 (2003-11-01), Keely et al.
patent: 2005/0007299 (2005-01-01), Gormish
patent: 2005/0034068 (2005-02-01), Jaeger
patent: 2007/0016862 (2007-01-01), Kuzmin
Taghva, Kazem et. al; An expert system for automatically correcting OCR output; 1994; Information Science Research Institute.
Kukich, Karen; Techniques for Automatically Correcting Words in Text; Dec. 1992; ACM Computing Surveys; vol. 24, No. 4.
DCLab, “Converting from PDF to XML &MS Word: Avoiding the Pitfalls.” at http://www.dclab.com/converting—from—pdf.asp, p. 1, Oct. 3, 2003 and p. 2, Nov. 3, 2003.
Iceni Technology, “Gemini,” 4 pages, at http://www.iceni.com/content/Gemini/, last visited Jun. 30, 2005.
Forney, “The Viterbi Algorithm,” Proceedings of the IEEE, vol. 61, No. 3, Mar. 1973.
Nevill-Manning et al., “Extracting Text from Postscript,” Software-Practice and Experience, vol. 28, No. 5, pp. 481-491, Apr. 1998.
ScanSoft, OmniPage, at http://www.scansoft.com/omnipage/capturesdk/, 2 pages, last visited Jul. 14, 2005.
Cambridgedocs XML Conversion and Publishing Technologies, at http://www.cambridgedoc.com/, last visited Jul. 14, 2005.
Kempe et al., “WFSC—A New Weighted Finite State Compiler,” Lecture Notes in Computer Science, vol. 2759/2003, pp. 108-119, Aug. 2003.
Dejean Herve
Kempe Andre
Fay Sharpe LLP
Ries Laurie
Xerox Corporation
LandOfFree
Text correction for PDF converters does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Text correction for PDF converters, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Text correction for PDF converters will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-4187866