Javascript must be enabled to continue!
Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system
View through CrossRef
AbstractDue to the increase in computing power, it is possible to improve the feature extraction and data fitting capabilities of DNN networks by increasing their depth and model complexity. However, the big data and complex models greatly increase the training overhead of DNN, so accelerating their training process becomes a key task. The Tianhe-3 peak speed is designed to target E-class, and the huge computing power provides a potential opportunity for DNN training. We implement and extend LeNet, AlexNet, VGG, and ResNet model training for a single MT-2000+ and FT-2000+ compute nodes, as well as extended multi-node clusters, and propose an improved gradient synchronization process for Dynamic Allreduce communication optimization strategy for the gradient synchronization process base on the ARM architecture features of the Tianhe-3 prototype, providing experimental data and theoretical basis for further enhancing and improving the performance of the Tianhe-3 prototype in large-scale distributed training of neural networks.
Springer Science and Business Media LLC
Title: Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system
Description:
AbstractDue to the increase in computing power, it is possible to improve the feature extraction and data fitting capabilities of DNN networks by increasing their depth and model complexity.
However, the big data and complex models greatly increase the training overhead of DNN, so accelerating their training process becomes a key task.
The Tianhe-3 peak speed is designed to target E-class, and the huge computing power provides a potential opportunity for DNN training.
We implement and extend LeNet, AlexNet, VGG, and ResNet model training for a single MT-2000+ and FT-2000+ compute nodes, as well as extended multi-node clusters, and propose an improved gradient synchronization process for Dynamic Allreduce communication optimization strategy for the gradient synchronization process base on the ARM architecture features of the Tianhe-3 prototype, providing experimental data and theoretical basis for further enhancing and improving the performance of the Tianhe-3 prototype in large-scale distributed training of neural networks.
Related Results
Fuzzy Chaotic Neural Networks
Fuzzy Chaotic Neural Networks
An understanding of the human brain’s local function has improved in recent years. But the cognition of human brain’s working process as a whole is still obscure. Both fuzzy logic ...
On the role of network dynamics for information processing in artificial and biological neural networks
On the role of network dynamics for information processing in artificial and biological neural networks
Understanding how interactions in complex systems give rise to various collective behaviours has been of interest for researchers across a wide range of fields. However, despite ma...
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background
Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find ...
Memorization capacity and robustness of neural networks
Memorization capacity and robustness of neural networks
Machine learning, and deep learning in particular, has recently undergone rapid advancements. To contribute to a rigorous understanding of deep learning, this thesis explores two d...
On Robust and Efficient Parallel Reservoir Simulation on Tianhe-2
On Robust and Efficient Parallel Reservoir Simulation on Tianhe-2
Abstract
Parallel reservoir simulators are now widely used with availability of super computers. Modern massively parallel supercomputers demonstrate great power for...
FinFET Devices and Integration
FinFET Devices and Integration
Through more than a decade of industry wide R&D effort, 3D-FinFET has found its way into manufacturing. In this abstract, we review the key progress in process and integration ...
ACM SIGCOMM computer communication review
ACM SIGCOMM computer communication review
At some point in the future, how far out we do not exactly know, wireless access to the Internet will outstrip all other forms of access bringing the freedom of mobility to the way...
Artificial neural network for the recognition of human emotions under a backpropagation algorithm
Artificial neural network for the recognition of human emotions under a backpropagation algorithm
The era of the technological revolution increasingly encourages the development of technologies that facilitate in one way or another people's daily activities, thus generating a g...

