PowerSGD
Practical low-rank gradient compression for distributed optimization
New low-rank gradient compressor based on power iteration that can i) compress gradients rapidly, ii) efficiently aggregate the compressed gradients using all-reduce, and iii) achieve test performance on par with SGD. The proposed algorithm is the only method evaluated that achieves consistent wall-clock speedups when benchmarked against regular SGD with an optimized communication backend. We demonstrate reduced training times for convolutional networks as well as LSTMs on common datasets.
inactive
—
entered showcase: 2020-05-01
—
entry updated: 2024-04-09
Intermediate
Application
Python
MIT