[scikit-learn] Using a new random number generator in libsvm and liblinear

Adrin adrin.jalali at gmail.com
Thu Jan 2 10:40:35 EST 2020


liblinear and libsvm use the C `rand()` function which returns number up to
32767 on the windows platform. This PR
<https://github.com/scikit-learn/scikit-learn/pull/13511> proposes the
following fix:

*Fixed a convergence issue in ``libsvm`` and ``liblinear`` on Windows
*impacting all related classifiers and regressors. The random number
*used to randomly select coordinates in the coordinate descent algorithm
*C ``rand()``, that is only able to generate numbers up to ``32767`` on
*platform. It was replaced with C++11 ``mt19937``, a Mersenne Twister that*
*correctly generates 31bits/63bits random numbers on all platforms. In
*the crude "modulo" postprocessor used to get a random number in a bounded*
*interval was replaced by the tweaked Lemire method as suggested by `this
*post <http://www.pcg-random.org/posts/bounded-rands.html

In order to keep the models consistent across platforms, we'd like to use
the same (new) rng
on all platforms, which means after this change the generated models may be
slightly different
to what they are now. We'd like to hear any concerns on the matter from the
community, here
or on the PR, before merging the fix.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200102/fd70f32b/attachment.html>

More information about the scikit-learn mailing list