[scikit-learn] Announcing sklearn-xarray

Tom Augspurger tom.augspurger88 at gmail.com
Mon Dec 4 11:00:37 EST 2017


I haven't looked at the implementation of `sklearn_xarray.dataarray.wrap`
yet, but a simple test
on `dask_ml.preprocessing.StandardScaler` failed with the (probably
expected) `TypeError: 'int' object is not iterable`
when dask-ml attempts an `X.mean(0)`.

I'd be interested to hear what changes dask-ml would need to make to get
things working on dask-back xarray datasets,
without reading everything into memory at once.

The code:


import sklearn_xarray.dataarray as da
from sklearn_xarray.data import load_dummy_dataarray
from dask_ml.preprocessing import StandardScaler

X = load_dummy_dataarray()
Xt = da.wrap(StandardScaler()).fit_transform(X)


Tom

On Mon, Dec 4, 2017 at 9:03 AM, Olivier Grisel <olivier.grisel at ensta.org>
wrote:

> Interesting project!
>
> BTW, do you know about dask-ml [1]?
>
> It might be interesting to think about generalizing the input validation
> of fit and predict / transform as a private method of the BaseEstimator
> class instead of directly calling into sklearn.utils.validation functions
> so has to make it easier for third party projects such as sklearn-xarray
> and dask-ml to subclass and override those methods to allow for specific
> input data-structure without converting everyting to a numpy array.
>
> [1] https://github.com/dask/dask-ml
>
>
>
> 2017-12-04 15:21 GMT+01:00 Peter Hausamann <peter.hausamann at tum.de>:
>
>> Hi all,
>>
>> I'd like to announce *sklearn-xarray*, a new package that provides a
>> scikit-learn interface for xarray users. For those not familiar with xarray
>> (http://xarray.pydata.org), it is a "pandas-like and pandas-compatible
>> toolkit for analytics on multi-dimensional arrays".
>>
>> The package makes it possible to apply sklearn estimators to xarray
>> DataArrays and Datasets while keeping the labels (called coordinates in
>> xarray) intact whereever possible.
>>
>> You can install the package via pip:
>>
>> pip install sklearn-xarray
>>
>> To get started, you can:
>>
>>    - read the documentation: https://phausamann.github.io/sklearn-xarray
>>    and
>>    - check out the repository: https://github.com
>>    /phausamann/sklearn-xarray
>>
>> Note that the package is still in a very early development stage and
>> there will probably be some major API changes in upcoming releases. Most
>> notably, I'd like to replicate the complete sklearn module structure at
>> some point by decorating all available estimators with the necessary
>> wrappers.
>>
>> Feedback of any kind is appreciated.
>>
>> Peter
>>
>>
>> _______________________________________________
>> scikit-learn mailing list
>> scikit-learn at python.org
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>>
>
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20171204/5a028fa1/attachment.html>


More information about the scikit-learn mailing list