[scikit-learn] Model checksums

Gael Varoquaux gael.varoquaux at normalesup.org
Tue Dec 13 15:10:47 EST 2016


What do you mean non deterministic? If you set the random_state of models,  we try to make them deterministic. Most often, any residual variability is numerical noise that reveals statistical error bars. 

G

⁣Sent from my phone. Please forgive brevity and mis spelling​

On Dec 13, 2016, 19:29, at 19:29, Stuart Reynolds <stuart at stuartreynolds.net> wrote:
>I'd like to cache some functions to avoid rebuilding models like so:
>
>    @cached
>    def train(model, dataparams): ...
>
>
>model is an (untrained) scikit-learn object and dataparams is a dict.
>The @cached annotation forms a SHA checksum out of the parameters of
>the
>function it annotates and returns the previously calculated function
>result
>if the parameters match.
>
>The tricky part here is reliably generating a checksum from the
>parameters.
>Scikit uses Python's pickle (
>http://scikit-learn.org/stable/modules/model_persistence.html) but the
>pickle library is non-deterministic (same inputs to pickle.dumps yields
>differing output! -- *I know*).
>
>So... any suggestions on how to generate checksums from models in
>python?
>
>Thanks.
>- Stuart
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>scikit-learn mailing list
>scikit-learn at python.org
>https://mail.python.org/mailman/listinfo/scikit-learn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161213/d0f678f0/attachment-0001.html>


More information about the scikit-learn mailing list