by Yangqing on 14 May 2014
For a sanity check, try running with a learning rate 0 to see if any nan errors pop up (they shouldn’t, since no learning takes place). If data is not initialized well, it might be possible that even 0.0001 is a too high learning rate.
by sguada on 13 May 2014
Try different initializations, for instance bias set to 0.1
References:
On custom data training diverges (loss = NaN) #409
nan issue with CIFAR10 example when running on CPU only #393