Javascript must be enabled to continue!
Survey data integration for distribution function and quantile estimation
View through CrossRef
Abstract
Estimates of finite population cumulative distribution functions (CDFs) and quantiles are critical for policy-making, resource allocation, and public health planning. For instance, federal finance agencies may require accurate estimates of the proportion of individuals with income below the federal poverty line to determine funding eligibility, while health organizations may rely on precise quantile estimates of key health variables to guide local health interventions. Despite growing interest in survey data integration, research on the integration of probability and nonprobability samples to estimate CDFs and quantiles remains limited. In this study, we propose a novel residual-based CDF estimator that integrates information from a probability sample with data from potentially large nonprobability samples. Our approach leverages shared covariates observed in both datasets, while the response variable is available only in the nonprobability sample. Using a semiparametric approach, we train an outcome model on the nonprobability sample and incorporate model residuals with sampling weights from the probability sample to estimate the CDF of the target variable. Based on this CDF estimator, we define a quantile estimator and introduce linearization and bootstrap methods for variance estimation of both the CDF and quantile estimators. Under certain regularity conditions, we establish the asymptotic properties, including bias and variance, of the CDF estimator. Our empirical findings support the theoretical results and demonstrate the favorable performance of the proposed estimators relative to plug-in mass imputation estimators and the naïve estimators derived from the nonprobability sample only. A real data example is presented to illustrate the proposed estimators.
Springer Science and Business Media LLC
Title: Survey data integration for distribution function and quantile estimation
Description:
Abstract
Estimates of finite population cumulative distribution functions (CDFs) and quantiles are critical for policy-making, resource allocation, and public health planning.
For instance, federal finance agencies may require accurate estimates of the proportion of individuals with income below the federal poverty line to determine funding eligibility, while health organizations may rely on precise quantile estimates of key health variables to guide local health interventions.
Despite growing interest in survey data integration, research on the integration of probability and nonprobability samples to estimate CDFs and quantiles remains limited.
In this study, we propose a novel residual-based CDF estimator that integrates information from a probability sample with data from potentially large nonprobability samples.
Our approach leverages shared covariates observed in both datasets, while the response variable is available only in the nonprobability sample.
Using a semiparametric approach, we train an outcome model on the nonprobability sample and incorporate model residuals with sampling weights from the probability sample to estimate the CDF of the target variable.
Based on this CDF estimator, we define a quantile estimator and introduce linearization and bootstrap methods for variance estimation of both the CDF and quantile estimators.
Under certain regularity conditions, we establish the asymptotic properties, including bias and variance, of the CDF estimator.
Our empirical findings support the theoretical results and demonstrate the favorable performance of the proposed estimators relative to plug-in mass imputation estimators and the naïve estimators derived from the nonprobability sample only.
A real data example is presented to illustrate the proposed estimators.
Related Results
M-quantile estimation and discriminant analysis for heteroscedastic processes
M-quantile estimation and discriminant analysis for heteroscedastic processes
Estimation du M-quantile et analyse discriminante pour les processus hétéroscédastiques
En s'appuyant sur des techniques dans les domaines temporel et fréquentiel, ...
Quantile-based Reliability Measures and Some Associated Stochastic Orderings
Quantile-based Reliability Measures and Some Associated Stochastic Orderings
There are several statistical models which have explicit quantile functions, but do not have manageable cumulative distribution functions. For example, Govindarajulu, various forms...
Modified Quantile Regression for Modeling the Low Birth Weight
Modified Quantile Regression for Modeling the Low Birth Weight
This study aims to identify the best model of low birth weight by applying and comparing several methods based on the quantile regression method's modification. The birth weight da...
Determinants of Capital Structure: A Quantile Regression Analysis
Determinants of Capital Structure: A Quantile Regression Analysis
Abstract
In this study, we attempted to analyze the determinants of capital structure for Indian firms using a panel framework and to investigate whether the capita...
A quantile regression forecasting model for ICT development
A quantile regression forecasting model for ICT development
Purpose
– Because quantile regression gets more popular and provides more comprehensive interpretations, it is important to advance quantile regression for forecast...
MEASURING THE IMPACT OF TAU VECTOR ON PARAMETER ESTIMATES IN THE PRESENCE OF HETEROSCEDASTIC DATA IN QUANTILE REGRESSION ANALYSIS
MEASURING THE IMPACT OF TAU VECTOR ON PARAMETER ESTIMATES IN THE PRESENCE OF HETEROSCEDASTIC DATA IN QUANTILE REGRESSION ANALYSIS
The ordinary least squares (OLS) regression models only the conditional mean of the response and is computationally less expensive. Quantile regression on the other hand is more ex...
The Psychometric Properties of Probability and Quantile Forecasts
The Psychometric Properties of Probability and Quantile Forecasts
Forecasting tournaments are a well established method for assessing human forecasting skills. Most forecasting tournaments are based on a format where participants estimate the pro...
The Psychometric Properties of Probability and Quantile Forecasts
The Psychometric Properties of Probability and Quantile Forecasts
Forecasting tournaments are a well established method for assessing human forecasting skills. Most forecasting tournaments are based on a format where participants estimate the pro...

