Accelerating Gossip-Based Deep Learning in Heterogeneous Edge Computing Platforms

Journal article (2021)

Authors

Rui Han

Shilin Li Beijing Institute of Technology

Xiangwei Wang Beijing Institute of Technology

Chi Harold Liu

Gaofeng Xin Beijing Institute of Technology

Lydia Y. Chen Data-Intensive Systems -

Y. Chen Data-Intensive Systems -

Research Group

Data-Intensive Systems () (TU Delft)

Deep learning Gossip Edge computing Decentralized training

To reference this document use:

http://resolver.tudelft.nl/uuid:e94cdf75-7a0c-4737-8ebd-90e7049535b0

More Info

expand_more

Published Date

2021

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Software Technology

Research Group

Data-Intensive Systems

Abstract

With the exponential growth of data created at the network edge, decentralized and Gossip-based training of deep learning (DL) models on edge computing (EC) gains tremendous research momentum, owing to its capability to learn from resource-strenuous edge nodes with limited network connectivity. Today's edge devices are extremely heterogeneous, e.g., hardware and software stacks, and result in high performance variation of training time and inducing extra delay to synchronize and converge. The large body of prior art accelerates DL, being data or model parallelization, via a centralized server, e.g., parameter server scheme, which may easily turn into the system bottleneck or single point of failure. In this artice, we propose EdgeGossip, a framework specifically designed to accelerate the training process of decentralized and Gossip-based DL training for heterogeneous EC platforms. EdgeGossip features on: (i) low performance variation among multiple EC platforms during iterative training, and (ii) accuracy-aware training to fastly obtain best possible model accuracy. We implement EdgeGossip based on popular Gossip algorithms and demonstrate its effectiveness using real-world DL workloads, i.e., considerably reducing model training time by an average of 2.70 times while only incurring accuracy losses of 0.78 percent.

Files

09303468.pdf

(pdf | 1.59 Mb)

Unknown license

Download not available