Paper
3 April 2024 A data parallel approach for distributed neural networks to achieve faster convergence
Nagaraju C., Yenda Ramesh, C. Krishna Mohan
Author Affiliations +
Proceedings Volume 13072, Sixteenth International Conference on Machine Vision (ICMV 2023); 130721A (2024) https://doi.org/10.1117/12.3023413
Event: Sixteenth International Conference on Machine Vision (ICMV 2023), 2023, Yerevan, Armenia
Abstract
The availability of large datasets has significantly contributed to recent advancements in deep Convolutional Neural Network (CNN) models. However, training a large CNN model using such datasets is a time-consuming task. This issue has been addressed by the parallelization and distribution of data/model during the training process. There are two ways to implement distributed deep learning processes: data parallelism and model parallelism. Data parallelism involves distributing the dataset across multiple workers, allowing them to process different portions simultaneously. While increasing the number of workers can reduce computation time, it also introduces additional communication time. In some cases, the increased communication time can outweigh the benefits gained from reduced computation time. In this paper, our focus is on reducing the overall computation time of data parallel approach by employing two strategies. First, we emphasize the preservation of dataset distribution across all workers, ensuring that each worker has access to representative data. Second, we explore the localization of parameters and the quantization of gradients to three levels: {-1, 0, 1} to reduce communication delays between the server and workers, as well as between workers themselves. By adopting these two strategies, we aim to enhance the performance of data parallel approach in the distributed deep learning processes. As a result of preserving the distribution of the data while sampling the entire data, each partition retains a similar mean and variance (capturing important first and second-order statistics). This approach guarantees that all worker machines train their local models on uniformly distributed data instead of random distribution. Additionally, localizing parameters limits the communication between the server and workers to gradients only. Furthermore, by quantizing gradients to 2-bits, we successfully achieve our objective of reducing computation time by enabling faster convergence without compromising test or validation accuracy. The experimental results demonstrate that employing these strategies in distributed deep learning effectively reduces communication overhead and leads to faster convergence when compared to methods that utilize random data sampling. These improvements were observed across multiple datasets such as MNIST, CIFAR-10, and Tiny ImageNet.
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Nagaraju C., Yenda Ramesh, and C. Krishna Mohan "A data parallel approach for distributed neural networks to achieve faster convergence", Proc. SPIE 13072, Sixteenth International Conference on Machine Vision (ICMV 2023), 130721A (3 April 2024); https://doi.org/10.1117/12.3023413
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data modeling

Education and training

Double positive medium

Deep learning

Quantization

Computation time

Neural networks

Back to Top