Neural networks are increasing in scale and sophistication, catalyzing the need for efficient hardware. An inevitability when transferring neural networks to hardware is that non-idealities impact performance. Hardware-aware training, where non-idealities are accounted for during training is one way to recover performance, but at the cost of generality. In this work, we demonstrate a binary neural network consisting of an array of 20,000 magnetic tunnel junctions (MTJ) integrated on complementary metal-oxide-semiconductor (CMOS) chips. With 36 dies, we show that even a few defects can degrade the performance of neural networks. We demonstrate hardware-aware training and show that performance recovers close to ideal networks. We then introduce a robust method – statistics-aware training – that compensates for defects regardless of their specific configuration. When evaluated on the MNIST dataset, statistics-aware solutions differ from software-baselines by only 2 %. We quantify the sensitivity of networks trained with statistics-aware and conventional methods and demonstrate that the statistics-aware solution shows less sensitivity to defects when sampling the network loss function.
|