Method for rule-based correction of spelling and grammar errors

Data processing: structural design – modeling – simulation – and em – Simulating electronic device or electrical system – Software program

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C703S002000, C704S008000, C704S009000, C704S239000, C704S240000

Reexamination Certificate

active

06618697

ABSTRACT:

MICROFICHE APPENDIX A microfiche appendix containing source code in the LISP language is filed herewith. It comprises 6 microfiche and 493 frames.
COPYRIGHT NOTICE
A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
BACKGROUND OF THE INVENTION
Although the availability of interactive spelling checkers is widespread, users do not like to use such systems because they are tedious. Interactive spelling checkers ask the user about any word that does not appear in the dictionary, even though most such words are valid. Such dictionary-based systems also do not detect valid word errors where the user accidentally substitutes one word for another. Even when the interactive systems do catch the errors (e.g., when the error yields a word that is not found in the dictionary), the first-guess accuracy is low, forcing the user to select the correct word from among a list of candidate alternatives. If the systems were to select the top-ranked candidate correction for automatic substitution, the low first-guess accuracy would mean that more than half of the automatic substitutions would be incorrect. Because of the extra effort involved and the tedious nature of the user interfaces, many users decide not to use interactive spelling checkers.
SUMMARY OF THE INVENTION
The present invention addresses these problems with known interactive spelling checkers. Since it has near-perfect first-guess accuracy, it can automatically correct errors as the user types without introducing new errors. It shifts the emphasis from recognizing valid words to recognizing errors. Identifying the nature of the error often allows correction of the error, even if there is no similar word in the valid word dictionary. Although there are existing systems based on dictionaries of common spelling errors and their associated corrections, these systems are limited to recognizing only the errors explicitly listed in the dictionary. The typical error dictionary contains about a thousand of the most common errors. The present invention presents a rule-based method for detecting and correcting spelling and grammar errors. The invention is not guaranteed to catch all errors, but those that it does correct are extremely likely to be genuine spelling and grammar errors. A variation of this invention for handwriting recognition and optical character recognition (OCR) improves the recognition accuracy of such systems.
A “regular expression” is a computer programming construct that comprises an n-gram template to be matched against a string of characters in a word. The n-gram template string may comprise less than all characters in the word. Matching the string either succeeds or fails. A matched pattern may cause addition, deletion, transposition and/or substitution of characters in the word. The n-gram template may comprise alternative characters, wild card characters and position indicators.
Briefly, according to one embodiment of this invention, there is provided a computer implemented method which does not require a stored dictionary of valid words for correcting spelling errors in a sequence of words. The method comprises the steps of storing a plurality of spelling rules defined as regular expressions for matching a potentially illegal n-gram which may comprise less than all letters in the word and for replacing an illegal n-gram with a legal n-gram to return a corrected word. A word from the sequence of words is submitted to the spelling rules. If a corrected word is returned, it is substituted for the misspelled word in the sequence of words. The method may comprise submitting a corrected word to at least one additional rule.
According to another embodiment of this invention, there is provided a method of correcting both spelling errors and grammar errors. The method comprises storing a plurality of spelling and grammar rules defined as regular expressions given the context of one or more adjacent words. At least two adjacent words at a time from the sequence of words are submitted to the rules. If a corrected word or sequence of corrected words is returned, it is substituted in the sequence of words.
Preferably, an exception list is associated with each regular expression or with the system as a whole to prevent n-gram replacement where the word matches an exception to the rule. Preferably, the spelling rules match potentially illegal n-grams comprising two or more characters. More preferably, the spelling rules recognize and correct complex types of errors in addition to simple insertions, deletions, substitutions and transpositions.
Applications of the methods disclosed herein include word processing programs that automatically correct errors as the user types, word processing programs with batch spelling correction, optical character reader programs and automatic handwriting recognition programs.
Most preferably, the methods according to this invention include storing spelling rules using multiple words in context to identify spelling errors, confusable words and common grammar errors to identify a unique correction from more than one possible correction or word boundary errors comprising missing spaces, inserted spaces, shifted spaces and combinations thereof.
According to a preferred embodiment, the stored rules include constraints based on case restrictions, parts of speech, capitalization and/or punctuation appearing within the sequence of words.
The methods according to this invention may also include a step for generating potential spelling rules defined as regular expressions comprising selecting as templates letters from errors in an error corpus and zero or more letters of context to identify a set of potential rules and the pruning from the set of potential rules those that are too general, too specific or do not identify the cause of the error. New rules may be generated based upon the user's manual corrections.
A further embodiment of this invention comprises a word completion method that is context sensitive comprising the steps of storing a plurality of word completion rules defined as regular expressions for matching an n-gram which may comprise less than all letters in the word and for replacing a matched n-gram with an n-gram to complete the word given the context of one or more preceding words. The previous word and n-gram comprising the initial letters of a word being typed are submitted to the rules. If a rule is fired, the word being typed is completed automatically.
The present invention goes beyond the state of the art by recognizing more than just isolated whole-word errors. It uses rules that recognize error patterns and their associated corrections. An error dictionary that contains only whole words can correct only as many errors as are listed in the dictionary. The rules used by the present invention can each correct numerous common errors without reference to a valid word dictionary. In essence, the present invention is not just recognizing the error, but also recognizing the cause of the error. This yields much more productive rules and, hence, a more powerful system.
The rules used by this invention are implemented by use of regular expressions, case-restriction flags, space deletion, insertion and shifting, and multiple words of context (including not just whole words and parts of speech, but also regular expressions). This allows the system to correct errors in a context-sensitive fashion, correct word-boundary errors and correct many valid word errors. The present invention can also correct many grammatical and lexical choice errors.
Regular expressions used by this invention include not just sequences of alphanumeric characters and start-word and end-word flags, but also more abstract patterns, such as left and right handedness of the letters, sets of letters, and the l

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method for rule-based correction of spelling and grammar errors does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method for rule-based correction of spelling and grammar errors, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for rule-based correction of spelling and grammar errors will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3041750

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.