[scikit-learn] [ANN] Scikit-learn 0.20.0

Fri Sep 28 13:38:39 EDT 2018

On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>>> I think model serialization should be a priority.
>> There is also the ONNX specification that is gaining industrial adoption and that already includes open source exporters for several families of scikit-learn models:
>>
>> https://github.com/onnx/onnxmltools
>
> Didn't know about that. This is really nice! What do you think about referring to it under http://scikit-learn.org/stable/modules/model_persistence.html to make people aware that this option exists?
> Would be happy to add a PR.
>
>
I don't think an open source runtime has been announced yet (or they 
didn't email me like they promised lol).
I'm quite excited about this as well.

Javier:
The problem is not so much storing the "model" but storing how to make 
predictions. Different versions could act differently
on the same data structure - and the data structure could change. Both 
happen in scikit-learn.
So if you want to make sure the right thing happens across versions, you 
either need to provide serialization and deserialization for
every version and conversion between those or you need to provide a way 
to store the prediction function,
which basically means you need a turing-complete language (that's what 
ONNX does).

We basically said doing the first is not feasible within scikit-learn 
given our current amount of resources, and no-one
has even tried doing it outside of scikit-learn (which would be possible).
Implementing a complete prediction serialization language (the second 
option) is definitely outside the scope of sklearn.