Apparatus and method for probabilistic population size and...

Data processing: measuring – calibrating – or testing – Measurement system – Statistical measurement

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000

Reexamination Certificate

active

06470298

ABSTRACT:

REFERENCE TO MICROFICHE APPENDIX
A computer source code listing containing a preferred embodiment of the present invention is included in a microfiche appendix, appended hereto, having one microfiche and fourteen frames.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to an apparatus and method for determining population size and overlap within information sources. More specifically, the present invention relates to a statistical technique for measuring population overlap without reliance on unique identifiers, and provides an alternative and superior method for determining population size.
2. Description of the Related Art
Government and private industry need to know the number of people who are involved in more than one institution, program, group, or activity, either concurrently or in sequence for purposes of management, monitoring, and evaluation.
The measurement of population overlap has been hampered by both the complexity of the social institutions and the lack of unique personal identifiers across existing data sets. Until now, the determination of the number of individuals shared across sub-populations has relied on one or more of three approaches to the problem: (1) the construction of detailed case registries (single data sets); (2) implementation of a true unique id system (e.g. National ID card) across multiple data sets; and (3) case by case matching of records from multiple data sets that describe the members and/or activities of various organizations and service sectors.
Traditionally, the problem of data set overlap has been most commonly approached by the development of case registries. The Gulf War Registry, designed to allow medical researchers to determine the prevalence and distribution of Gulf War Syndrome is one current example. The National Breast Cancer Registry is another. In the 1960s, a number of states established psychiatric case registries in order to determine the prevalence and distribution of mental illness. In every case the problem was the same. Existing fragmented information systems could not support the critical epidemiological functions of determining the relationship among existing data sets. There are three important shortcomings to this approach. First, the creation of case registries is a very expensive undertaking. Second, the completeness of a registry is always in question, especially when participation is voluntary. The incompleteness of the Gulf War Registry is notorious. Finally, because they necessarily include personal identifiers, the creation of case registries raises important issues about personal privacy and confidentiality of personal records.
The implementation of universal true unique personal identifier systems provides a second solution to the problem of determining the number of people involved in different subpopulations. While the implementation of such identification systems has been successfully accomplished for specific organizations (e.g. individual hospitals, correctional facilities, and insurance companies), these identification systems do not constitute the kind of universal identification systems that allow for analysis of membership overlap. In the United States, the social security number comes close to providing a universal identification system, but concerns about personal privacy severely limit the availability of these identifiers in settings not directly related to the social security system.
Case by case matching of records from multiple data sets based on the names of people or other identifiers that may be shared by more than one data set is a third approach to the problem. Case by case database integration on a patient specific basis has been utilized in a number of fields. From a practical point of view, this approach is has two major shortcomings. First, it is tedious, time consuming, and expensive. Second, it includes an unquantifiable degree of error. This approach also depends on personal identifiers, so concerns about privacy and confidentiality are likely to limit its utilization.
The problem of measuring the overlap between populations where no unique person identifier exists is related to the problem of measuring population size (the number of distinct individuals) without a unique person identifier. The problem of estimating population size may, in fact, be seen as a constituent part of the larger problem of estimating population overlap. In the past, the measurement of the number of people represented in a single data set that does not include a unique person identifier has relied on either of two statistical approaches. One statistical approach applies the capture-recapture sampling technique to the problem. This approach is illustrated by Abeni et al., “Capture-Recapture to Estimate the Size of the Population with Human Immunodeficiency Virus Type 1 Infection,”
Epidemiology,
Volume 5 Number 4, July 1994 (pp. 410-414). The other statistical technique is based on a classical occupancy theory, as discussed by Feller, “An Introduction to Probability Theory and Its Applications,” Volume 1, Second Edition, 1957. The classical occupancy theory is described on pages 210-211 and 224 of Feller's text. One implementation of the classical occupancy theory has been provided by Larsen, “Estimation of the Number of People in a Register from the Number of Birthdates,”
Statistics in Medicine,
Volume 13, 1994 (pp. 177-183). The present invention uses a fundamentally different, and far superior, implementation of the classical occupancy theory.
The capture-recapture technique is, in essence, case by case matching of small samples of larger populations. It avoids the cost associated with complete case by case matching, but still raises issues of personal privacy and confidentiality because it relies on personal identifiers for a subset of the population. Capture-recapture was originally developed by ecologists to estimate the size of wildlife populations. In the simplest setting, a sample of wildlife is captured, tagged, and released. At a later time, a second sample is drawn and overlap with the first sample is determined. The sizes of the two samples and their overlap are used to statistically determine the size of the total population and the confidence interval associated with the estimate. In applications to human populations, capture-recapture draws samples from lists of members of subpopulations. Personal identifiers are used to measure overlap of the samples and statistical computations are used to determine the size of the overall population. The greatest shortcoming of the capture-recapture approach is the large confidence intervals associated with the measure. It is not unusual to find confidence intervals of ±50% of the population parameter as illustrated by Abeni et al.
A statistical procedure that addresses the problem of estimating the size of a population without a unique personal identifier has been provided by Larsen's maximum likelihood estimate of the solution to the classical occupancy problem. Larsen applied his solution to the estimation of the number of people represented in an anonymous Chlamydia registry in one county in Denmark. His solution provides less precise estimates and contains greater error than the solution provided by the present invention. In addition, his solution does not address the population overlap problem.
SUMMARY OF THE INVENTION
Accordingly, it is an object of the present invention to identify an accurate quantity of unique individuals (entities, objects, items, etc.) in a data source containing potentially multiple records pertaining to a particular individual.
It is a further object of the present invention to identify an accurate quantity of unique individuals (entities, objects, items, etc.) overlapping across multiple data sources which may contain multiple records pertaining to a particular individual within a single data source or within multiple data sources.
It is another object of the present invention to determine a more precise and smaller range of variance of the quantity of unique

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Apparatus and method for probabilistic population size and... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Apparatus and method for probabilistic population size and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for probabilistic population size and... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2986755

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.