Javascript must be enabled to continue!

Combining Clustering and Factor Analysis as Complementary Techniques

The study of driver behavior and associated accidents has been of interest to researchers and insurance companies. From the perspective of insurance companies, identifying factors that contribute to traffic violations plays a significant role in providing insurance quotes as it establishes the basis for charging appropriate insurance rates to customers. This study assesses the traffic violations intensity for 64 counties in the state of Florida, USA by using the publicly available traffic violations data set. This data set consists of 3,669,796 records with 11 attributes, which include race, gender, driver's age, type of driving violation, etc. The 187 types of traffic violations are categorized into 11 broad traffic violations categories. Two machine learning algorithms, factor analysis and k-means clustering, were applied in this study. After applying factor analysis, a new comprehensive traffic violation index (TVI) was developed, which quantified the traffic violation intensity of each county. All the counties in the data set were ranked with the TVI scores, and the counties with high TVI scores were identified. K-means clustering algorithm was then applied to the same data, and four clusters of counties were derived. The counties that were grouped in each cluster were compared with the TVI scores to check if the counties in each cluster had similar TVI scores. The counties with the highest TVI scores are found to be grouped in one cluster, followed by counties with the next high TVI scores in the second cluster, and so on. Thus, it is observed that there is a perfect match in the results of both models. They serve as two techniques complementary to each other, in that the k-means clustering method groups counties with comparable traffic violation intensities and factor analysis is able to also rank individual counties according to the TVI. These techniques have identified the counties with high traffic violation intensities, which helps the policymakers to take adequate measures for traffic management.

IGI Global

Lakshmi Prayaga Krishna Devulapalli Chandra Prayaga

International Journal of Data Analytics

2020

Title: Combining Clustering and Factor Analysis as Complementary Techniques

Description:

The study of driver behavior and associated accidents has been of interest to researchers and insurance companies.

From the perspective of insurance companies, identifying factors that contribute to traffic violations plays a significant role in providing insurance quotes as it establishes the basis for charging appropriate insurance rates to customers.

This study assesses the traffic violations intensity for 64 counties in the state of Florida, USA by using the publicly available traffic violations data set.

This data set consists of 3,669,796 records with 11 attributes, which include race, gender, driver's age, type of driving violation, etc.

The 187 types of traffic violations are categorized into 11 broad traffic violations categories.

Two machine learning algorithms, factor analysis and k-means clustering, were applied in this study.

After applying factor analysis, a new comprehensive traffic violation index (TVI) was developed, which quantified the traffic violation intensity of each county.

All the counties in the data set were ranked with the TVI scores, and the counties with high TVI scores were identified.

K-means clustering algorithm was then applied to the same data, and four clusters of counties were derived.

The counties that were grouped in each cluster were compared with the TVI scores to check if the counties in each cluster had similar TVI scores.

The counties with the highest TVI scores are found to be grouped in one cluster, followed by counties with the next high TVI scores in the second cluster, and so on.

Thus, it is observed that there is a perfect match in the results of both models.

They serve as two techniques complementary to each other, in that the k-means clustering method groups counties with comparable traffic violation intensities and factor analysis is able to also rank individual counties according to the TVI.

These techniques have identified the counties with high traffic violation intensities, which helps the policymakers to take adequate measures for traffic management.

Back

Related Results

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Optimizing machine learning techniques for genomics clustering

Optimisation des techniques d’apprentissage automatique pour le clustering génomique Dans le domaine de la bioinformatique, le clustering est une technique efficace...

Image clustering using exponential discriminant analysis

Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

A Proposed Clustering Algorithm for Efficient Clustering of High-Dimensional Data

To partition transaction data values, clustering algorithms are used. To analyse the relationships between transactions, similarity measures are utilized. Similarity models based o...

ANALISIS KETERKAITAN KEKERASAN DENGAN PERBUATAN CABUL TERHADAP ANAK

<span id="page3R_mcid85" class="markedContent"><em><span style="left: calc(var(--scale-factor)*85.10px); top: calc(var(--scale-factor)*399.05px); font-size: calc(var...

PERBANDINGAN ALGORITMA K-MEANS, K-MEDOID, DAN DBSCAN UNTUK CLUSTERING KUALITAS HIDUP INDONESIA DALAM PERSPEKTIF KNOWLEDGE MANAGEMENT DAN DATA DISCOVERY

Kemajuan era digital mendunia memaksa manusia harus semakin peka dalam menggunakan teknologi dalam setiap aspek kehidupan. Khususnya pergerakan kualitas hidup di Indonesia, tantang...

Efektivitas Penerapan Teknik Clustering Terhadap Keterampilan Menulis Puisi Bebas Siswa Sekolah Dasar Gugus IV Kecamatan Biringkanaya Kota Makassar

Penelitian ini bertujuan untuk mendeskripsikan keefektifan penerapan teknik Clustering, mengetahui gambaran keterampilan menulis puisi bebas siswa, menguji keefektifan penerapan te...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

Email:
Password:

Email: