PowerSGD

Practical low-rank gradient compression for distributed optimization

Martin Jaggi — Machine Learning and Optimization Laboratory

New low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets.

EPFL algorithm in the world's most popular deep learning software
Using the matrix to help Meta gear up

PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization — Presented at: NeurIPS 2019

inactive — entered showcase: 2020-05-01 — entry updated: 2024-04-09

Lab GitHub - last commit: 2023-07-04

Intermediate

Application

Python

MIT