On Fri, 9 Mar 2018 11:38:55, Robert Kern wrote:
Sorry for being a bit inaccurate. My Scala code actually mirrors the NumPy based random initialization, so I sample with Gaussian of mean = 0 and std dev = 1, then multiply with 0.01.
Have you verified this? I.e. save out the Scala-initialized network and load it up with numpy to check the mean and std dev? How about if you run the numpy NN training with the Scala-initialized network? Does that also diverge?
I did what you suggested and it turned out my NumPy NN code was behaving exactly as the Scala code when using Scala-initialized network. After digging deeper into this I managed to find and fix a bug in how I was doing the random initilization and it's working correctly now. Thanks a lot for your help! Marko