null
(Ed.)
The topic of training machine learning models by employing multiple gradient-computing workers is attracting great interest recently. Communication efficiency in such distributed learning settings is an important consideration, especially for the case where the needed communications are expensive in terms of power usage. We develop a new approach which is efficient in terms of communication transmissions. In this scheme, only the most informative worker results are transmitted to reduce the total number of transmissions. Our ordered gradient approach provably achieves the same order of convergence rate as gradient descent for nonconvex smooth loss functions while gradient descent always requires more communications. Experiments show significant communication savings compared to the best existing approaches in some cases.
more »
« less