Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Missing Values Compensation in Duplicates Detection Using Hot Deck

View through CrossRef
Abstract Duplicate record is a known problem within the datasets especially within databases of huge volumes. The accuracy of duplicates detection determines the efficiency of the duplicates removal process. Unfortunately, the effort to detect duplicates becomes more challenging due to the presence of missing values within the records. This is because, during the clustering and matching process, missing values can cause records that are similar to be assigned in a wrong group, causing the duplicates left undetected. In this paper, we present how duplicates detection can be improved even though missing values are present within a data set using our Duplicates Detection within the Incomplete Data set (DDID) method. We hypothetically add the missing values to the key attributes of two datasets under study, using an arbitrary pattern to simulate both complete and incomplete data sets. We analyze the results to evaluate the performance of duplicates detection using the Hot Deck method to compensate for the missing values in the key attributes. We hypothesize that by using Hot Deck, there is a performance improvement in duplicates detection. The performance of the DDID is compared with an early duplicates detection method (called DuDe) in terms of its accuracy and speed. The findings of the experiment show that, even though the data sets are incomplete, DDID is capable to offer better accuracy and faster duplicates detection as compared to a benchmark method (called DuDe). The results of this study contribute to duplicates detection under incomplete data sets constraint.
Springer Science and Business Media LLC
Title: Missing Values Compensation in Duplicates Detection Using Hot Deck
Description:
Abstract Duplicate record is a known problem within the datasets especially within databases of huge volumes.
The accuracy of duplicates detection determines the efficiency of the duplicates removal process.
Unfortunately, the effort to detect duplicates becomes more challenging due to the presence of missing values within the records.
This is because, during the clustering and matching process, missing values can cause records that are similar to be assigned in a wrong group, causing the duplicates left undetected.
In this paper, we present how duplicates detection can be improved even though missing values are present within a data set using our Duplicates Detection within the Incomplete Data set (DDID) method.
We hypothetically add the missing values to the key attributes of two datasets under study, using an arbitrary pattern to simulate both complete and incomplete data sets.
We analyze the results to evaluate the performance of duplicates detection using the Hot Deck method to compensate for the missing values in the key attributes.
We hypothesize that by using Hot Deck, there is a performance improvement in duplicates detection.
The performance of the DDID is compared with an early duplicates detection method (called DuDe) in terms of its accuracy and speed.
The findings of the experiment show that, even though the data sets are incomplete, DDID is capable to offer better accuracy and faster duplicates detection as compared to a benchmark method (called DuDe).
The results of this study contribute to duplicates detection under incomplete data sets constraint.

Related Results

Snorre TLP Topside Design
Snorre TLP Topside Design
ABSTRACT This paper gives a brief description of the topside facilities and integrated deck structure for the Snorre TLP which will operate on Block 34/7 in the N...
Structural Aspects Of Snorre Mating
Structural Aspects Of Snorre Mating
ABSTRACT The deck and the hull of the Snorre TLP were successfully connected South of Stord on the Norwegian westcoast in September 1991. The 28.000 tonnes deck w...
Missing values compensation in duplicates detection using hot deck method
Missing values compensation in duplicates detection using hot deck method
Abstract Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency ...
Smart?Leg Computations Validated During the Ekpe Gas Compression Project Float-over Deck Installation
Smart?Leg Computations Validated During the Ekpe Gas Compression Project Float-over Deck Installation
Abstract The Smart?Leg system developed by ETPM ensures the shockless float-over installation of heavy fully-commissioned integrated decks on offshore jacket stru...
Adaptive deletion of functional duplicate genes in Drosophila
Adaptive deletion of functional duplicate genes in Drosophila
ABSTRACT Gene deletion is traditionally viewed as a nonadaptive mechanism that eliminates functional redundancy, yet emerging evidence indicates ...
The Jacking System and Simulator for Mating the Hutton TLP
The Jacking System and Simulator for Mating the Hutton TLP
ABSTRACT The Hutton TLP Deck/ Hull mating and weldout operation required the use of a sophisticated and unique hydraulic jacking system. The system was designed t...
Failure and Repair of Deck Closure Pour on Interstate 81
Failure and Repair of Deck Closure Pour on Interstate 81
On April 6, 2009, a 3-ft × 3-ft section of a closure pour in a reinforced concrete deck on Interstate 81 punched through. The deck had been cast in 1992 during a deck replacement p...
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...

Back to Top