Data processing: database and file management or data structures – Database design – Data structure types
Reexamination Certificate
2001-10-02
2004-06-15
Rones, Charles (Department: 2175)
Data processing: database and file management or data structures
Database design
Data structure types
C707S793000, C707S793000
Reexamination Certificate
active
06751607
ABSTRACT:
BACKGROUND OF THE PRESENT INVENTION
1. Field of the Present Invention
The present invention relates generally to database systems, and more particularly to systems and methods for the identification of latent relationships amongst data elements in very large computer databases.
2. Description of Related Art
Businesses, governments, agencies, and other institutions have for a long-time required various people to fill out reports and forms. The fate of many of these filings is they get forgotten. Paper forms are, in general, hard to file and access. So very often the purpose and objective in requiring the filing of forms gets lost in the impracticality of dealing with so much paper.
The Federal suspicious activity report (SAR) is a case-in-point. A SAR is required to be filed by bank officials in every instance in which some unusual financial transaction or contact has occurred. Under Title 12 Code of Federal Regulations §21 (12 CFR 21), all financial institutions operating in the United States, including insured banks, savings associations, savings association service corporations, credit unions, bank holding companies, non-bank subsidiaries of bank holding companies, Edge and Agreement corporations, and American branches and agencies of foreign banks, are required to make this report following the discovery of insider abuse involving any amount, violations aggregating $5,000 or more where a suspect can be identified, violations aggregating $25,000 or more regardless of a potential suspect, or transactions aggregating $5,000 or more that involve potential money laundering or violations of the Bank Secrecy Act.
The information provided by SAR filings across the country is invaluable to law enforcement. But the data entry, indexing and filing of paper reports results in a lot of information being lost due to misfilings, alternative wordings, misspellings, typographical errors, and other inconsistencies. So the Treasury Department, has a website (www.occ.treas.gov) that provides downloadable software for electronic filing and distribution of SAR's. To make that report, the filing institution prepares a SAR, and files it with the Financial Crimes Enforcement Network (FinCEN) of the Department of the Treasury through the IRS Detroit Computing Center. The reports are then made available electronically to appropriate law enforcement agencies.
Banks that do not wish to file electronically can continue to file paper-based reports. FinCEN has made copies of the forms available for download in Adobe Acrobat portable document format (PDF).
The FinCEN SAR database and other modern databases are proliferating in almost every area of science, technology, and society in general. Information systems are getting larger, more complex, and more important. Data mining technology has been developed that extracts useful information from databases. The identification of latent relationships among data elements in very large databases is a particularly lofty goal of current systems. This problem is made even more difficult when there are errors or variations in the type of data such as misspellings and abbreviations.
A variety of data mining methods have appeared on the commercial market in database software packages. But these prior art products are only useful when applied to small databases, where the data is pristine, and when identifying the sought-for information is comparatively straightforward. Such conventional database systems are confounded as the size of the database, the noise in the data, and the complexity of the information increases.
High-performance data mining tools take advantage of the format and structure of a given set of data. The data format and data structure are usually not subject to change. Data users must understand the data well enough to be able to identify any relationships that may exist amongst the data elements, even though the nature of such relationships may not be known beforehand. Search speeds can be improved if each database is decomposed into separate components. Once decomposed, the detection of latent relationships amongst the seemingly independent data elements becomes easier, and the speed at which the database may be searched for information can be significantly enhanced.
Many databases use fixed-width data fields, while others use a combination of fixed fields, free text and even graphic images. Fixed-field databases are the easiest to deal with, and conventional approaches to data mining may produce reasonable results provided the data is error-free. Free-text databases are more troublesome, particularly when any latent relationship information occurs in variable-width fields. Misspelled words and the use of abbreviations only complicates the job more.
SUMMARY OF THE PRESENT INVENTION
An object of the present invention is to provide a database method and system for the identification of any latent relationships amongst independent data elements in very large databases.
A further object of the present invention is to provide a database method and system for effective data mining by relatively inexperienced users.
Another object of the present invention is to provide a law enforcement tool for detecting obscure relationships and subtle activities of criminals and their organizations.
Briefly, a database method embodiment of the present invention quickly identifies latent relationships among data elements in very large databases. The database method converts each word in a document in a large database into a very wide digital-bit vector. Unique words appearing in the document are encoded into particular bits of the corresponding vector by writing ones. The possibility of a keyword being present in any document can therefore be quickly ascertained by throwing out all vectors with digital zeros in the critical bit positions that must be ones.
An advantage of the present invention is that a database search tool is provided that can scan exceedingly large databases quickly and efficiently.
Another advantage of the present invention is a data mining tool is provided that can cope with alternate spellings, misspellings, and abbreviations used for keywords in report form inputs.
These and many other objects and advantages of the present invention will no doubt become obvious to those of ordinary skill in the art after having read the following detailed description of the preferred embodiments which are illustrated in the various drawing figures.
REFERENCES:
patent: 4322576 (1982-03-01), Miller
patent: 5195136 (1993-03-01), Hardy et al.
patent: 5956717 (1999-09-01), Kraay et al.
patent: 6028939 (2000-02-01), Yin
patent: 6378073 (2002-04-01), Davis et al.
patent: 2002/0009208 (2002-01-01), Alattar et al.
patent: 2003/0018608 (2003-01-01), Rice et al.
Brisbin Charles E.
Carlson Ralph
Kraay Thomas A.
LandOfFree
System and method for the identification of latent... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for the identification of latent..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for the identification of latent... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-3365421