12 October 2020

Hiding in the Crowd

To cope with the lack of on-device machine learning samples, this article presents a distributed data augmentation algorithm, coined federated data augmentation (FAug). In FAug, devices share a tiny fraction of their local data, i.e., seed samples, and collectively train a synthetic sample generator that can augment the local datasets of devices. To further improve FAug, we introduce a multihop-based seed sample collection method and an oversampling technique that mixes up collected seed samples. Both approaches enjoy the benefit from the crowd of devices, by hiding data privacy from preceding hops and feeding diverse seed samples. In the image classification tasks, simulations demonstrate that the proposed FAug frameworks yield stronger privacy guarantees, lower communication latency, and higher on-device ML accuracy.