Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2000-11-10
2003-10-28
Robinson, Greta (Department: 2177)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000, C707S793000
Reexamination Certificate
active
06640228
ABSTRACT:
TECHNICAL FIELD
The present invention is directed to data categorization, and more particularly to a method for detecting incorrectly categorized data that is used in directories, such as telephone directories.
BACKGROUND ART
Phone directories and other information sources containing large amounts of data often provide the data into two formats: a general listing that simply lists business names and phone numbers alphabetically (often called “white pages”) and a grouped listing that lists business names and phone numbers under selected categories (often called “yellow pages”). The grouped listings usually include business names and telephone numbers that are grouped by business category so that the phone numbers for similar types of businesses are found in the same location in the phone directory.
Problems occur, however, if the category information, such as the business category of a business phone listing, is incorrect. More particularly, failure to thoroughly check the category assignment data or process for accuracy and timeliness may cause data miscategorization or categorizing of incorrect or outdated data. Categorization of telephone data is usually conducted by scanning telephone directories and using optical character recognition and/or human labor to match the data in the general listing with an appropriate category in the grouped listing. Errors may occur if, for example, the information in the general listing and the grouped listing does not match or is otherwise defective (e.g., if a phone number in the general listing belongs to a party that has gone out of business, if the same phone number is assigned to two different businesses due to irregularities in the change history of the phone number, or if a given phone number is reassigned to a business that is different than the business who previously had the same phone number). Failure to detect incorrectly categorized data may result in directories that list multiple phone numbers for the same business or assign unrelated businesses to the same category. Also, failure to delete outdated entries may further add to categorization errors.
Although manual review of the entries and categories can detect categorization assignment errors, applying the same level of review to all of the assignments is inefficient and cumbersome, particularly if the data being checked includes a large number of pairs and a relatively small number of errors.
There is a need for a method or system that can detect the existence of incorrectly categorized data so that the incorrectly categorized data can be located and fixed efficiently.
SUMMARY OF THE INVENTION
Accordingly, the present invention is directed to a method for determining the reliability of a data category assignment, and more particularly for determining the accuracy of an assignment of an entry to a given category. The method includes the steps of obtaining a database containing a plurality of entry-category pairs and calculating a score for each entry-category pair, wherein the score corresponds to a likelihood that the entry is correctly assigned to the category. The score itself can be calculated based on the relative probability between the occurrence of a particular entry-category pair and the number of occurrences of the entry and the category, separately, in the database.
In one embodiment, the method includes sorting the pairs according to the scores and generating a curve based on the calculated scores to indicate the likelihood that a given portion of the sorted pairs will contain accurately or inaccurately categorized data and/or to estimate the number of inaccurate data categorizations. The method may also include checking the pairs against an existing reference database. Once a data region having a higher likelihood of errors has been identified via the inventive system, any manual review of the data can be more efficiently targeted toward error-prone data rather than correctly categorized data.
REFERENCES:
patent: 5251131 (1993-10-01), Masand et al.
patent: 5488725 (1996-01-01), Turtle et al.
patent: 5675710 (1997-10-01), Lewis
patent: 5704004 (1997-12-01), Li et al.
patent: 5799278 (1998-08-01), Cobbett et al.
patent: 5943670 (1999-08-01), Prager
patent: 6003027 (1999-12-01), Prager
patent: 6137911 (2000-10-01), Zhilyaev
patent: 6161130 (2000-12-01), Horvitz et al.
patent: 6167369 (2000-12-01), Schulze
patent: 6182083 (2001-01-01), Scheifler et al.
patent: 6192360 (2001-02-01), Dumais et al.
patent: 6233575 (2001-05-01), Agrawal et al.
patent: 6374241 (2002-04-01), Lamburt et al.
patent: 6400806 (2002-06-01), Uppaluru
patent: 6408294 (2002-06-01), Getchius et al.
patent: 6466918 (2002-10-01), Spiegel et al.
patent: 6466928 (2002-10-01), Blasko et al.
patent: 6489968 (2002-12-01), Ortega et al.
Handerson Steven Kendall
Ponte Jay Michael
Le Miranda
Robinson Greta
Suchyta Leonard Charles
Verizon Laboratories Inc.
Weixel James K.
LandOfFree
Method for detecting incorrectly categorized data does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Method for detecting incorrectly categorized data, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method for detecting incorrectly categorized data will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3141149