[scikit-learn] [ANN] Scikit-learn 0.20.0

Fri Sep 28 13:41:13 EDT 2018


On 09/28/2018 01:38 PM, Andreas Mueller wrote:
>
>
> On 09/28/2018 12:10 PM, Sebastian Raschka wrote:
>>>> I think model serialization should be a priority.
>>> There is also the ONNX specification that is gaining industrial 
>>> adoption and that already includes open source exporters for several 
>>> families of scikit-learn models:
>>>
>>> https://github.com/onnx/onnxmltools
>>
>> Didn't know about that. This is really nice! What do you think about 
>> referring to it under 
>> http://scikit-learn.org/stable/modules/model_persistence.html to make 
>> people aware that this option exists?
>> Would be happy to add a PR.
>>
>>
> I don't think an open source runtime has been announced yet (or they 
> didn't email me like they promised lol).
> I'm quite excited about this as well.
>
> Javier:
> The problem is not so much storing the "model" but storing how to make 
> predictions. Different versions could act differently
> on the same data structure - and the data structure could change. Both 
> happen in scikit-learn.
> So if you want to make sure the right thing happens across versions, 
> you either need to provide serialization and deserialization for
> every version and conversion between those or you need to provide a 
> way to store the prediction function,
> which basically means you need a turing-complete language (that's what 
> ONNX does).
>
> We basically said doing the first is not feasible within scikit-learn 
> given our current amount of resources, and no-one
> has even tried doing it outside of scikit-learn (which would be 
> possible).
> Implementing a complete prediction serialization language (the second 
> option) is definitely outside the scope of sklearn.
>
>
Maybe we should add to the FAQ why serialization is hard?