Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

SYNTHETIC MULTIVARIATE DATA GENERATION PROCEDURE WITH VARIOUS OUTLIER SCENARIOS USING R PROGRAMMING LANGUAGE

View through CrossRef
A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model. The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution. A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study. An outlier generating model is used to generate multivariate data that contains outliers. Data generation procedures for various outlier scenarios by using R are explained. Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown. The graphical representation shows that as the distance between outliers and inliers by shifting the mean, increases in Outlier Scenario 1, the outliers and inliers are completely separated. The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2. For Outlier Scenario 3, when both values  and  increase, the separation of outliers and inliers are more apparent. The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method.
Title: SYNTHETIC MULTIVARIATE DATA GENERATION PROCEDURE WITH VARIOUS OUTLIER SCENARIOS USING R PROGRAMMING LANGUAGE
Description:
A synthetic data generation procedure is a procedure to generate data from either a statistical or mathematical model.
The data generation procedure has been used in simulation studies to compare statistical performance methods or propose a new statistical method with a specific distribution.
A synthetic multivariate data generation procedure with various outlier scenarios using R is formulated in this study.
An outlier generating model is used to generate multivariate data that contains outliers.
Data generation procedures for various outlier scenarios by using R are explained.
Three outlier scenarios are produced, and graphical representations using 3D scatterplot and Chernoff faces for these outlier scenarios are shown.
The graphical representation shows that as the distance between outliers and inliers by shifting the mean, increases in Outlier Scenario 1, the outliers and inliers are completely separated.
The same pattern can also be seen when the distance between outliers and inliers, by shifting the covariance, increase in Outlier Scenario 2.
For Outlier Scenario 3, when both values  and  increase, the separation of outliers and inliers are more apparent.
The data generation procedure in this study will be continually used in other applications, such as identifying outliers by using the clustering method.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data
A New Single Linkage Robust Clustering Outlier Detection Procedures for Multivarite Data
Outliers are abnormal data, and the detection of outliers in multivariate data has always been of interest. Unlike univariate data, outlier detection for multivariate data is insuf...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of t...
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
A Monte Carlo-Based Outlier Diagnosis Method for Sensitivity Analysis
An iterative outlier elimination procedure based on hypothesis testing, commonly known as Iterative Data Snooping (IDS) among geodesists, is often used for the quality control of m...
Optimasi Algoritma K-Nearest Neighbors Berdasarkan Perbandingan Analisis Outlier (Berbasis Jarak, Kepadatan, LOF)
Optimasi Algoritma K-Nearest Neighbors Berdasarkan Perbandingan Analisis Outlier (Berbasis Jarak, Kepadatan, LOF)
Pertumbuhan data yang terjadi saat ini berpengaruh terhadap analisis data di berbagai bidang, seperti astronomi, bisnis, kedokteran, pendidikan, dan finansial. Data yang terkumpul ...
Investigating Outlier Detection Techniques Based on Kernel Rough Clustering
Investigating Outlier Detection Techniques Based on Kernel Rough Clustering
Background: Data quality is crucial to the success of big data analytics. However, the presence of outliers affects data quality and data analysis. Employing effective outlier dete...

Back to Top