A data parallel approach for distributed neural networks to achieve faster convergence

Nagaraju C.; Yenda Ramesh; C. Krishna Mohan

doi:10.1117/12.3023413

3 April 2024 A data parallel approach for distributed neural networks to achieve faster convergence

Nagaraju C., Yenda Ramesh, C. Krishna Mohan

Proceedings Volume 13072, Sixteenth International Conference on Machine Vision (ICMV 2023); 130721A (2024) https://doi.org/10.1117/12.3023413
Event: Sixteenth International Conference on Machine Vision (ICMV 2023), 2023, Yerevan, Armenia

Abstract

The availability of large datasets has significantly contributed to recent advancements in deep Convolutional Neural Network (CNN) models. However, training a large CNN model using such datasets is a time-consuming task. This issue has been addressed by the parallelization and distribution of data/model during the training process. There are two ways to implement distributed deep learning processes: data parallelism and model parallelism. Data parallelism involves distributing the dataset across multiple workers, allowing them to process different portions simultaneously. While increasing the number of workers can reduce computation time, it also introduces additional communication time. In some cases, the increased communication time can outweigh the benefits gained from reduced computation time. In this paper, our focus is on reducing the overall computation time of data parallel approach by employing two strategies. First, we emphasize the preservation of dataset distribution across all workers, ensuring that each worker has access to representative data. Second, we explore the localization of parameters and the quantization of gradients to three levels: {-1, 0, 1} to reduce communication delays between the server and workers, as well as between workers themselves. By adopting these two strategies, we aim to enhance the performance of data parallel approach in the distributed deep learning processes. As a result of preserving the distribution of the data while sampling the entire data, each partition retains a similar mean and variance (capturing important first and second-order statistics). This approach guarantees that all worker machines train their local models on uniformly distributed data instead of random distribution. Additionally, localizing parameters limits the communication between the server and workers to gradients only. Furthermore, by quantizing gradients to 2-bits, we successfully achieve our objective of reducing computation time by enabling faster convergence without compromising test or validation accuracy. The experimental results demonstrate that employing these strategies in distributed deep learning effectively reduces communication overhead and leads to faster convergence when compared to methods that utilize random data sampling. These improvements were observed across multiple datasets such as MNIST, CIFAR-10, and Tiny ImageNet.

(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.

Citation Download Citation

Nagaraju C., Yenda Ramesh, and C. Krishna Mohan "A data parallel approach for distributed neural networks to achieve faster convergence", Proc. SPIE 13072, Sixteenth International Conference on Machine Vision (ICMV 2023), 130721A (3 April 2024); https://doi.org/10.1117/12.3023413

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available

Members: $17.00

Non-members: $21.00 ADD TO CART

PROCEEDINGS
10 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Data modeling

Education and training

Double positive medium

Deep learning

Quantization

Computation time

Neural networks

Show All Keywords

Keywords/Phrases

Search In:

Publication Years