Javascript must be enabled to continue!

The Efficiency of the Hierarchical Clustering Method

The Hierarchical Clustering Method (HCM) has been the main algorithm for clustering asteroids into groups of similar origins, i.e. into families, since the early 1990s [1][2]. With surveys such as NEOWISE, GAIA and the future LSST providing large amounts of asteroid observations it is necessary to analyse the efficiency of the HCM at handling the exponentially larger volumes of data. Our work aims to investigate how the effectiveness of the HCM can be characterised, with respect to an asteroid family’s age, location, and the density of the background it is situated in.  The efficiency of an algorithm can primarily be noted through two conditions, its ‘accuracy’ and its ‘precision’. These are the metrics  we have used to measure the HCM. The accuracy of an algorithm is a measure of the completeness of correct identification of background and family asteroids. Precision measures the ratio to correctly and incorrectly identified asteroids within a sample from the data set. Accuracy is useful for estimating the number of family members that the HCM will have missed, and precision for the number of interlopers that have contaminated the sample the HCM returns. This experiment was undertaken using a synthetic background generated by Rogerio Deienno, akin to his investigation into the Yarkovsky V-Shape Method [3]. Four families were generated at different points in the main-belt, and with ages varying from 10 Myr to 4.5 Gyr.  The HCM was varied with cut-off velocities ranging from 10 to 1000 m/s.The cut-off with the highest overall accuracy was taken, and the precision at this point was measured. Our results show that position within the main belt made little difference to the efficiency of the algorithm. At the theoretical peak accuracy, the youngest families returned accuracy values ranging from 80-91%, with precision varying between 90-95%. No family older than 1.5 Gyr had an accuracy value of greater than 50%, and families older than 2.5 Gyr scored maximum values of 19% or lower. The precision for older families varied greatly as the cut-off velocity was increased. By limiting the survey cut-off velocity, it was possible to keep the precision higher, however this was at the expense of accuracy, trading family completeness for a reduction in interlopers.  The HCM algorithm suffers from a problem of ‘chaining’. When the algorithm branches from a parent body to agglomerate nearby asteroids, clusters of nearby interloper asteroids act as a bridge that can cause the branching to extend outside of the family. This problem worsens as families age and dissipate, or if there are more background asteroids that will interfere and act as interlopers. With the growing datasets this problem will be exacerbated, and the continued use of the HCM as a tool for asteroid family clustering ‘as is’ must be addressed.Figure 1: The plot depicts the variation of peak accuracy of the HCM with respect to a families position and age within the main belt. Older families are lighter coloured, and younger families darker. References:[1] Zappala, V., Cellino, A., Farinella, P., and Knezevic, Z., (1990). Asteroid Families. I. Identification by Hierarchical Clustering and Reliability Assessment, The Astronomical Journal, vol. 100, IOP, p. 2030, 1990. doi:10.1086/115658.[2] Zappala, V., Cellino, A., Farinella, P., & Milani (1994), Asteroid Families. II. Extension to Unnumbered Multiopposition Asteroids, A. Astronomical Journal (ISSN 0004-6256), vol. 107, no. 2, p. 772-801, 1994AJ....107..772Z[3] Deienno, R., Walsh, K., Delbo, M., (2021). Efficiency characterization of the V-shape asteroid family detection method. Icarus. 357. 114218. 10.1016/j.icarus.2020.114218.

Copernicus GmbH

Andrew Marshall-Lee Marco Delbo Apostolos Christou Rogerio Deienno Kevin Walsh

2024

Title: The Efficiency of the Hierarchical Clustering Method

Description:

The Hierarchical Clustering Method (HCM) has been the main algorithm for clustering asteroids into groups of similar origins, i.

into families, since the early 1990s [1][2].

With surveys such as NEOWISE, GAIA and the future LSST providing large amounts of asteroid observations it is necessary to analyse the efficiency of the HCM at handling the exponentially larger volumes of data.

Our work aims to investigate how the effectiveness of the HCM can be characterised, with respect to an asteroid family’s age, location, and the density of the background it is situated in.

  The efficiency of an algorithm can primarily be noted through two conditions, its ‘accuracy’ and its ‘precision’.

These are the metrics  we have used to measure the HCM.

The accuracy of an algorithm is a measure of the completeness of correct identification of background and family asteroids.

Precision measures the ratio to correctly and incorrectly identified asteroids within a sample from the data set.

Accuracy is useful for estimating the number of family members that the HCM will have missed, and precision for the number of interlopers that have contaminated the sample the HCM returns.

This experiment was undertaken using a synthetic background generated by Rogerio Deienno, akin to his investigation into the Yarkovsky V-Shape Method [3].

Four families were generated at different points in the main-belt, and with ages varying from 10 Myr to 4.

5 Gyr.

  The HCM was varied with cut-off velocities ranging from 10 to 1000 m/s.

The cut-off with the highest overall accuracy was taken, and the precision at this point was measured.

Our results show that position within the main belt made little difference to the efficiency of the algorithm.

At the theoretical peak accuracy, the youngest families returned accuracy values ranging from 80-91%, with precision varying between 90-95%.

No family older than 1.

5 Gyr had an accuracy value of greater than 50%, and families older than 2.

5 Gyr scored maximum values of 19% or lower.

The precision for older families varied greatly as the cut-off velocity was increased.

By limiting the survey cut-off velocity, it was possible to keep the precision higher, however this was at the expense of accuracy, trading family completeness for a reduction in interlopers.

  The HCM algorithm suffers from a problem of ‘chaining’.

When the algorithm branches from a parent body to agglomerate nearby asteroids, clusters of nearby interloper asteroids act as a bridge that can cause the branching to extend outside of the family.

This problem worsens as families age and dissipate, or if there are more background asteroids that will interfere and act as interlopers.

With the growing datasets this problem will be exacerbated, and the continued use of the HCM as a tool for asteroid family clustering ‘as is’ must be addressed.

Figure 1: The plot depicts the variation of peak accuracy of the HCM with respect to a families position and age within the main belt.

Older families are lighter coloured, and younger families darker.

References:[1] Zappala, V.

, Cellino, A.

, Farinella, P.

, and Knezevic, Z.

, (1990).

Asteroid Families.

Identification by Hierarchical Clustering and Reliability Assessment, The Astronomical Journal, vol.

100, IOP, p.

2030, 1990.

doi:10.

1086/115658.

[2] Zappala, V.

, Cellino, A.

, Farinella, P.

, & Milani (1994), Asteroid Families.

II.

Extension to Unnumbered Multiopposition Asteroids, A.

Astronomical Journal (ISSN 0004-6256), vol.

107, no.

2, p.

772-801, 1994AJ.

107.

772Z[3] Deienno, R.

, Walsh, K.

, Delbo, M.

, (2021).

Efficiency characterization of the V-shape asteroid family detection method.

Icarus.

357.

114218.

10.

1016/j.

icarus.

2020.

114218.

Back

Related Results

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Clustering Analysis of Data with High Dimensionality

Clustering analysis has been widely applied in diverse fields such as data mining, access structures, knowledge discovery, software engineering, organization of information systems...

A COMPARATIVE ANALYSIS OF K-MEANS AND HIERARCHICAL CLUSTERING

Clustering is the process of arranging comparable data elements into groups. One of the most frequent data mining analytical techniques is clustering analysis; the clustering algor...

Image clustering using exponential discriminant analysis

Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization

<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...

Streaming Hierarchical Clustering Based on Point-Set Kernel

Abstract Hierarchical clustering produces a cluster tree with different granularities. As a result, hierarchical clustering provides richer information and insight into a d...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

Research on a microseismic signal picking algorithm based on GTOA clustering

Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...

Email:
Password:

Email: