Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Resource efficient distributed computing

View through CrossRef
There is a surge of interests in distributed computing thanks to advances in clustered computing and big data technology. My research explores topics on machine learning and big data technologies related to learning under decentralized resources. One topic of distributed learning is to distribute large scale centralized computation to clustered or multi-core computers. We propose a method for fast computation of kNN search, random projection forests (rpForests). RpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections. RpForests has a very low computational complexity as a tree-based methodology and achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances, as demonstrated on many datasets. The ensemble nature of rpForests makes it easily parallelized to run on clustered or multi-core computers; the running time is shown to be nearly inversely proportional to the number of cores or machines. Another two topics treats the data in machine learning as a computing resource. Existing learning algorithms typically assume all the data to be in one centralized place while it is increasingly often that the data are located at a number of distributed sites, and we wish to learn over data from all the sites with low communication overhead. Also, it is often that the data of interest has features shared by some other datasets from multiple sources. It is desirable to take advantage of such auxiliary datasets. We proposed two approaches under this topic—fast communication-efficient spectral clustering overdistributed data and fuzzy join of data with shared features. A novel framework is proposed that enables computation over data from all the physical nodes, with minimal communications overhead while a major speedup in computation for spectral clustering. The loss in accuracy is negligible compared to the non-distributed setting. The proposed approach allows local parallel computing at where the data are located and the speedup is most substantial when the data are evenly distributed across sites. Experiments show almost no loss in accuracy with our approach while a 2x speedup under various settings with two distributed sites. As the transmitted data does not need to be in their original form, the framework readily addresses the privacy concern for data sharing in distributed computing. We propose another efficient algorithm fuzzy join that enhances the learning from the provided data by leveraging the auxiliary data through shared features. Fuzzy join enables the extraction of additional information along the dimension implied by features in the auxiliary data that are not in the given data. Our implementation based on random projection forests is efficient with log linear computational complexity, and is resistant to noises in the data. Experiments demonstrate the practicality of our approach. Fuzzy join extends the scope of the join operation in relational databases by performing join on non-index key columns and allowing non-exact matches between rows from different datasets.
University of Massachusetts Dartmouth
Title: Resource efficient distributed computing
Description:
There is a surge of interests in distributed computing thanks to advances in clustered computing and big data technology.
My research explores topics on machine learning and big data technologies related to learning under decentralized resources.
One topic of distributed learning is to distribute large scale centralized computation to clustered or multi-core computers.
We propose a method for fast computation of kNN search, random projection forests (rpForests).
RpForests finds nearest neighbors by combining multiple kNN-sensitive trees with each constructed recursively through a series of random projections.
RpForests has a very low computational complexity as a tree-based methodology and achieves a remarkable accuracy in terms of fast decaying missing rate of kNNs and that of discrepancy in the k-th nearest neighbor distances, as demonstrated on many datasets.
The ensemble nature of rpForests makes it easily parallelized to run on clustered or multi-core computers; the running time is shown to be nearly inversely proportional to the number of cores or machines.
Another two topics treats the data in machine learning as a computing resource.
Existing learning algorithms typically assume all the data to be in one centralized place while it is increasingly often that the data are located at a number of distributed sites, and we wish to learn over data from all the sites with low communication overhead.
Also, it is often that the data of interest has features shared by some other datasets from multiple sources.
It is desirable to take advantage of such auxiliary datasets.
We proposed two approaches under this topic—fast communication-efficient spectral clustering overdistributed data and fuzzy join of data with shared features.
A novel framework is proposed that enables computation over data from all the physical nodes, with minimal communications overhead while a major speedup in computation for spectral clustering.
The loss in accuracy is negligible compared to the non-distributed setting.
The proposed approach allows local parallel computing at where the data are located and the speedup is most substantial when the data are evenly distributed across sites.
Experiments show almost no loss in accuracy with our approach while a 2x speedup under various settings with two distributed sites.
As the transmitted data does not need to be in their original form, the framework readily addresses the privacy concern for data sharing in distributed computing.
We propose another efficient algorithm fuzzy join that enhances the learning from the provided data by leveraging the auxiliary data through shared features.
Fuzzy join enables the extraction of additional information along the dimension implied by features in the auxiliary data that are not in the given data.
Our implementation based on random projection forests is efficient with log linear computational complexity, and is resistant to noises in the data.
Experiments demonstrate the practicality of our approach.
Fuzzy join extends the scope of the join operation in relational databases by performing join on non-index key columns and allowing non-exact matches between rows from different datasets.

Related Results

CLOUD COMPUTING - NAVIGATING THE DIGITAL SKY
CLOUD COMPUTING - NAVIGATING THE DIGITAL SKY
“Cloud Computing – Navigating the Digital Sky” is an extensive guide designed to provide a thorough understanding of cloud computing, an essential technology in today’s digital age...
Current state and prospects of edge computing within the Internet of Things (IoT) ecosystem
Current state and prospects of edge computing within the Internet of Things (IoT) ecosystem
The burgeoning growth of the Internet of Things (IoT) has prompted a paradigm shift in computing architectures, leading to the emergence and rapid evolution of edge computing. This...
Dynamic Pricing in Edge computing Resource Allocation Based on Stackelberg Dynamic Game
Dynamic Pricing in Edge computing Resource Allocation Based on Stackelberg Dynamic Game
Abstract The dynamic changes of mobile terminals have led to the more complex environment for edge computing resource allocation. Edge nodes are generally mobile wireless d...
New approaches for resource management and job scheduling for HEP grid computing
New approaches for resource management and job scheduling for HEP grid computing
(English) The Large Hadron Collider (LHC) ALICE (A Large Ion Collider Experiment) experiment uses grid computing for its extensive data processing and analysis. The ALICE Grid is c...
Advancements in Quantum Computing and Information Science
Advancements in Quantum Computing and Information Science
Abstract: The chapter "Advancements in Quantum Computing and Information Science" explores the fundamental principles, historical development, and modern applications of quantum co...
Influence of Strategic Human Resource Management Practices on Performance of Public Universities in Kenya
Influence of Strategic Human Resource Management Practices on Performance of Public Universities in Kenya
Purpose: The objective of the study was to determine the effect of Strategic Human Resource Management Practices (SHRMPs) on performance of public universities. Methodology: ...
DE-RALBA: dynamic enhanced resource aware load balancing algorithm for cloud computing
DE-RALBA: dynamic enhanced resource aware load balancing algorithm for cloud computing
Cloud computing provides an opportunity to gain access to the large-scale and high-speed resources without establishing your own computing infrastructure for executing the high-per...
The Dual-Helical Evolution of Network Computing: Toward Autonomous Intelligence Over Computing Power Networks
The Dual-Helical Evolution of Network Computing: Toward Autonomous Intelligence Over Computing Power Networks
Driven by explosively growing application demands and rapid technological advances, network computing paradigms have continuously reshaped how computational resources are organized...

Back to Top