Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values

View through CrossRef
This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values. The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step. The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables). Note that the entire dataset is divided into training and test datasets at a ratio of 7:3. Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model. Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively. The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700. The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable. The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7. We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.
Title: Assessing the Performance of a Long Short-Term Memory Algorithm in the Dataset with Missing Values
Description:
This study was conducted to assess the performance of a long short-term memory algorithm (LSTM), which was suitable for time series prediction, in the multivariate dataset with missing values.
The full dataset for the adopted LSTM model was prepared by running a popular watershed model Hydrological Simulation Program-Fortran (HSPF) in the upper Nam River Basin for 3 years from 2016 to 2018, excluding a one-year warm-up period, on a daily time step.
The accuracy of prediction for the LSTM model was evaluated in response to various interpolation methods as well as changes in the number of missing values (for dependent variables) and independent variables (containing a fixed number of missing values for either single or multiple variables).
Note that the entire dataset is divided into training and test datasets at a ratio of 7:3.
Results showed that different interpolation methods resulted in a considerable variation in performance of the LSTM model.
Out of them, StructTS and RPART were selected as the best imputation methods recovering missing values for discharge and total phosphorus, respectively.
The prediction error of the LSTM model increased gradually with increasing the number of missing values from 300 to 700.
The LSTM model, however, appeared to maintain its performance fairly well even in data sets with a large amount of missing values as long as adequate interpolation methods were adopted for each dependent variable.
The performance of the LSTM model degraded further as the number of independent variables containing the fixed number of missing values increased from 1 to 7.
We believe that the proposed methodology can be used not only to reconstruct missing values in a real-time monitoring dataset with excellent performance, but also to improve the accuracy of prediction for (time series) deep learning models.

Related Results

Introduction to the Tafel v-bis Dataset: Death Duty Summary Information for The Netherlands, 1921
Introduction to the Tafel v-bis Dataset: Death Duty Summary Information for The Netherlands, 1921
Abstract This article introduces a newly constructed dataset (i.e. the Tafel v-bis Dataset) containing summary information for all Dutch citizens who died in 1921 and were subject ...
Optimizing Random Forests: Spark Implementations of Random Genetic Forests
Optimizing Random Forests: Spark Implementations of Random Genetic Forests
The Random Forest (RF) algorithm, originally proposed by Breiman [7], is a widely used machine learning algorithm that gains its merit from its fast learning speed as well as high ...
Evaluation of the Quality of Robust Clustering Algorithm TCLUST on the Example of Dataset of Air Pollutants Emission in Krakow
Evaluation of the Quality of Robust Clustering Algorithm TCLUST on the Example of Dataset of Air Pollutants Emission in Krakow
Acquisition and data collection is currently a very dynamic processes. In order to obtain from data useful information, when huge quantities of data, the processing of the data is ...
A Comparative Study of Some Selected Classifiers on an Imbalanced Dataset for Sentiment Analysis
A Comparative Study of Some Selected Classifiers on an Imbalanced Dataset for Sentiment Analysis
Extracting subjective data from online user generated text documents is made quite easy with the use of sentiment analysis. For a classification task different individual algorithm...
Padova Emotional Dataset of Facial Expressions (PEDFE): A unique dataset of genuine and posed emotional facial expressions
Padova Emotional Dataset of Facial Expressions (PEDFE): A unique dataset of genuine and posed emotional facial expressions
AbstractFacial expressions are among the most powerful signals for human beings to convey their emotional states. Indeed, emotional facial datasets represent the most effective and...
Schubert Winterreise Dataset
Schubert Winterreise Dataset
This article presents a multimodal dataset comprising various representations and annotations of Franz Schubert’s song cycle Winterreise . Schubert’s semina...
Embodying Memory: Intersections Between Sri Lankan Performance Art and Prosthetic Memory
Embodying Memory: Intersections Between Sri Lankan Performance Art and Prosthetic Memory
In positing that memories acquired through visual media can impact subjectivity and alter worldview, prosthetic memory relies on an individual’s ability to build connections with t...
Understanding Anti-performance: The performative division of experience and the standpoint of the non-performer
Understanding Anti-performance: The performative division of experience and the standpoint of the non-performer
Performance theorists have long been drawn to the potential of performance to subvert established institutions. The results of performance are never fully determined in advance; pe...

Back to Top