Javascript must be enabled to continue!
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
View through CrossRef
Abstract
Background
The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, emphasizing the need to manage missing data from various sources in making accurate forecasts.
Objective
We aimed to show how handling missing data can affect estimates of the COVID-19 incidence rate (CIR) in different pandemic situations.
Methods
This study used data from the COVID-19/SARS-CoV-2 surveillance system at the National Institute of Hygiene and Epidemiology, Vietnam. We separated the available data set into 3 distinct periods: zero COVID-19, transition, and new normal. We randomly removed 5% to 30% of data that were missing completely at random, with a break of 5% at each time point in the variable daily caseload of COVID-19. We selected 7 analytical methods to assess the effects of handling missing data and calculated statistical and epidemiological indices to measure the effectiveness of each method.
Results
Our study examined missing data imputation performance across 3 study time periods: zero COVID-19 (n=3149), transition (n=1290), and new normal (n=9288). Imputation analyses showed that K-nearest neighbor (KNN) had the lowest mean absolute percentage change (APC) in CIR across the range (5% to 30%) of missing data. For instance, with 15% missing data, KNN resulted in 10.6%, 10.6%, and 9.7% average bias across the zero COVID-19, transition, and new normal periods, compared to 39.9%, 51.9%, and 289.7% with the maximum likelihood method. The autoregressive integrated moving average model showed the greatest mean APC in the mean number of confirmed cases of COVID-19 during each COVID-19 containment cycle (CCC) when we imputed the missing data in the zero COVID-19 period, rising from 226.3% at the 5% missing level to 6955.7% at the 30% missing level. Imputing missing data with median imputation methods had the lowest bias in the average number of confirmed cases in each CCC at all levels of missing data. In detail, in the 20% missing scenario, while median imputation had an average bias of 16.3% for confirmed cases in each CCC, which was lower than the KNN figure, maximum likelihood imputation showed a bias on average of 92.4% for confirmed cases in each CCC, which was the highest figure. During the new normal period in the 25% and 30% missing data scenarios, KNN imputation had average biases for CIR and confirmed cases in each CCC ranging from 21% to 32% for both, while maximum likelihood and moving average imputation showed biases on average above 250% for both CIR and confirmed cases in each CCC.
Conclusions
Our study emphasizes the importance of understanding that the specific imputation method used by investigators should be tailored to the specific epidemiological context and data collection environment to ensure reliable estimates of the CIR.
Title: Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Description:
Abstract
Background
The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, emphasizing the need to manage missing data from various sources in making accurate forecasts.
Objective
We aimed to show how handling missing data can affect estimates of the COVID-19 incidence rate (CIR) in different pandemic situations.
Methods
This study used data from the COVID-19/SARS-CoV-2 surveillance system at the National Institute of Hygiene and Epidemiology, Vietnam.
We separated the available data set into 3 distinct periods: zero COVID-19, transition, and new normal.
We randomly removed 5% to 30% of data that were missing completely at random, with a break of 5% at each time point in the variable daily caseload of COVID-19.
We selected 7 analytical methods to assess the effects of handling missing data and calculated statistical and epidemiological indices to measure the effectiveness of each method.
Results
Our study examined missing data imputation performance across 3 study time periods: zero COVID-19 (n=3149), transition (n=1290), and new normal (n=9288).
Imputation analyses showed that K-nearest neighbor (KNN) had the lowest mean absolute percentage change (APC) in CIR across the range (5% to 30%) of missing data.
For instance, with 15% missing data, KNN resulted in 10.
6%, 10.
6%, and 9.
7% average bias across the zero COVID-19, transition, and new normal periods, compared to 39.
9%, 51.
9%, and 289.
7% with the maximum likelihood method.
The autoregressive integrated moving average model showed the greatest mean APC in the mean number of confirmed cases of COVID-19 during each COVID-19 containment cycle (CCC) when we imputed the missing data in the zero COVID-19 period, rising from 226.
3% at the 5% missing level to 6955.
7% at the 30% missing level.
Imputing missing data with median imputation methods had the lowest bias in the average number of confirmed cases in each CCC at all levels of missing data.
In detail, in the 20% missing scenario, while median imputation had an average bias of 16.
3% for confirmed cases in each CCC, which was lower than the KNN figure, maximum likelihood imputation showed a bias on average of 92.
4% for confirmed cases in each CCC, which was the highest figure.
During the new normal period in the 25% and 30% missing data scenarios, KNN imputation had average biases for CIR and confirmed cases in each CCC ranging from 21% to 32% for both, while maximum likelihood and moving average imputation showed biases on average above 250% for both CIR and confirmed cases in each CCC.
Conclusions
Our study emphasizes the importance of understanding that the specific imputation method used by investigators should be tailored to the specific epidemiological context and data collection environment to ensure reliable estimates of the CIR.
Related Results
How is missing data handled in cluster randomized controlled trials? A review of trials published in the NIHR Journals Library 1997–2024
How is missing data handled in cluster randomized controlled trials? A review of trials published in the NIHR Journals Library 1997–2024
Background:
Cluster randomized controlled trials are increasingly used to evaluate the effectiveness of interventions in clinical and public health research. However, m...
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...
COVID impact on pattern of ischemic heart disease in comparable period
COVID impact on pattern of ischemic heart disease in comparable period
Aim: To compare the impact of COVID-19 on pattern of Ischemic Heart Disease in comparable period by assessing the incidence, severity of symptoms and in-hospital mortality of Ische...
CARA PENCEGAHAN PENYEBARAN COVID-19
CARA PENCEGAHAN PENYEBARAN COVID-19
ABSTRAK Covid-19 melanda banyak Negara di dunia termasuk Indonesia. Wabah Covid-19 tidak hanya merupakan masalah nasional dalam suatu Negara, tapi sudah merupakan masalah global. C...
Using Primary Care Text Data and Natural Language Processing to Monitor COVID-19 in Toronto, Canada
Using Primary Care Text Data and Natural Language Processing to Monitor COVID-19 in Toronto, Canada
AbstractObjectiveTo investigate whether a rule-based natural language processing (NLP) system, applied to primary care clinical text data, can be used to monitor COVID-19 viral act...
COVID-19 PANDEMIC AND MANAGEMENT OF HYPERTENSION
COVID-19 PANDEMIC AND MANAGEMENT OF HYPERTENSION
Dear Editor,
In December 2019, a new virus which is known as SARS-COV-2 (COVID-19) was identified. In a short period, this virus spread rapidly and caused significant morbidities a...
COVID-19 Testing in Young Individuals and Pandemics Monitoring: Low Susceptibility to the Infection and Lack of Positive Results
COVID-19 Testing in Young Individuals and Pandemics Monitoring: Low Susceptibility to the Infection and Lack of Positive Results
Severe Acute Respiratory Syndrome Coronavirus 2 (SARSCoV- 2), a novel betacoronavirus, is the etiological agent of coronavirus disease 2019 (COVID-19), a global health threat. The ...
PEMANFAATAN OBAT TRADISIONAL PENANGKAL PENULARAN COVID-19
PEMANFAATAN OBAT TRADISIONAL PENANGKAL PENULARAN COVID-19
In this time of the rampant Covid-19 pandemic, all recommended treatments and vaccines do not guarantee that you will be protected from the Covid-19 virus. Moreover, currently the ...


