Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Combining Clustering and Factor Analysis as Complementary Techniques

View through CrossRef
The study of driver behavior and associated accidents has been of interest to researchers and insurance companies. From the perspective of insurance companies, identifying factors that contribute to traffic violations plays a significant role in providing insurance quotes as it establishes the basis for charging appropriate insurance rates to customers. This study assesses the traffic violations intensity for 64 counties in the state of Florida, USA by using the publicly available traffic violations data set. This data set consists of 3,669,796 records with 11 attributes, which include race, gender, driver's age, type of driving violation, etc. The 187 types of traffic violations are categorized into 11 broad traffic violations categories. Two machine learning algorithms, factor analysis and k-means clustering, were applied in this study. After applying factor analysis, a new comprehensive traffic violation index (TVI) was developed, which quantified the traffic violation intensity of each county. All the counties in the data set were ranked with the TVI scores, and the counties with high TVI scores were identified. K-means clustering algorithm was then applied to the same data, and four clusters of counties were derived. The counties that were grouped in each cluster were compared with the TVI scores to check if the counties in each cluster had similar TVI scores. The counties with the highest TVI scores are found to be grouped in one cluster, followed by counties with the next high TVI scores in the second cluster, and so on. Thus, it is observed that there is a perfect match in the results of both models. They serve as two techniques complementary to each other, in that the k-means clustering method groups counties with comparable traffic violation intensities and factor analysis is able to also rank individual counties according to the TVI. These techniques have identified the counties with high traffic violation intensities, which helps the policymakers to take adequate measures for traffic management.
Title: Combining Clustering and Factor Analysis as Complementary Techniques
Description:
The study of driver behavior and associated accidents has been of interest to researchers and insurance companies.
From the perspective of insurance companies, identifying factors that contribute to traffic violations plays a significant role in providing insurance quotes as it establishes the basis for charging appropriate insurance rates to customers.
This study assesses the traffic violations intensity for 64 counties in the state of Florida, USA by using the publicly available traffic violations data set.
This data set consists of 3,669,796 records with 11 attributes, which include race, gender, driver's age, type of driving violation, etc.
The 187 types of traffic violations are categorized into 11 broad traffic violations categories.
Two machine learning algorithms, factor analysis and k-means clustering, were applied in this study.
After applying factor analysis, a new comprehensive traffic violation index (TVI) was developed, which quantified the traffic violation intensity of each county.
All the counties in the data set were ranked with the TVI scores, and the counties with high TVI scores were identified.
K-means clustering algorithm was then applied to the same data, and four clusters of counties were derived.
The counties that were grouped in each cluster were compared with the TVI scores to check if the counties in each cluster had similar TVI scores.
The counties with the highest TVI scores are found to be grouped in one cluster, followed by counties with the next high TVI scores in the second cluster, and so on.
Thus, it is observed that there is a perfect match in the results of both models.
They serve as two techniques complementary to each other, in that the k-means clustering method groups counties with comparable traffic violation intensities and factor analysis is able to also rank individual counties according to the TVI.
These techniques have identified the counties with high traffic violation intensities, which helps the policymakers to take adequate measures for traffic management.

Related Results

The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Optimizing machine learning techniques for genomics clustering
Optimizing machine learning techniques for genomics clustering
Optimisation des techniques d’apprentissage automatique pour le clustering génomique Dans le domaine de la bioinformatique, le clustering est une technique efficace...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data
To partition transaction data values, clustering algorithms are used. To analyse the relationships between transactions, similarity measures are utilized. Similarity models based o...
ANALISIS KETERKAITAN KEKERASAN DENGAN PERBUATAN CABUL TERHADAP ANAK
ANALISIS KETERKAITAN KEKERASAN DENGAN PERBUATAN CABUL TERHADAP ANAK
<span id="page3R_mcid85" class="markedContent"><em><span style="left: calc(var(--scale-factor)*85.10px); top: calc(var(--scale-factor)*399.05px); font-size: calc(var...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

Back to Top