From jeremie.du-boisberranger at inria.fr Fri Jan 10 05:56:30 2025 From: jeremie.du-boisberranger at inria.fr (=?UTF-8?Q?J=C3=A9r=C3=A9mie_du_Boisberranger?=) Date: Fri, 10 Jan 2025 11:56:30 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.6.1 is online! Message-ID: <8e866d37-45a1-4db8-a41d-af12d1a2c546@inria.fr> Hello everyone, We're happy to announce the 1.6.1 release ! It contains fixes for a few regressions introduced in 1.6. You can see the changelog here: https://scikit-learn.org/stable/whats_new/v1.6.html#version-1-6-1 You can upgrade with pip as usual: ``` pip install -U scikit-learn ``` The conda-forge builds can be installed using: ``` conda install -c conda-forge scikit-learn ``` Thanks to everyone who contributed to this release ! J?r?mie, on behalf of the Scikit-learn maintainers team. From olivier.grisel at ensta.org Fri Jan 10 09:45:22 2025 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 10 Jan 2025 15:45:22 +0100 Subject: [scikit-learn] [ANN] scikit-learn 1.6.1 is online! In-Reply-To: <8e866d37-45a1-4db8-a41d-af12d1a2c546@inria.fr> References: <8e866d37-45a1-4db8-a41d-af12d1a2c546@inria.fr> Message-ID: Thank you very much, J?r?mie and everyone else involved in making this release happen! -- Olivier Le ven. 10 janv. 2025 ? 11:59, J?r?mie du Boisberranger < jeremie.du-boisberranger at inria.fr> a ?crit : > Hello everyone, > > We're happy to announce the 1.6.1 release ! > > > It contains fixes for a few regressions introduced in 1.6. > > You can see the changelog here: > https://scikit-learn.org/stable/whats_new/v1.6.html#version-1-6-1 > > > You can upgrade with pip as usual: > > ``` > pip install -U scikit-learn > ``` > > The conda-forge builds can be installed using: > > ``` > conda install -c conda-forge scikit-learn > ``` > > > Thanks to everyone who contributed to this release ! > > J?r?mie, on behalf of the Scikit-learn maintainers team. > > _______________________________________________ > scikit-learn mailing list > scikit-learn at python.org > https://mail.python.org/mailman/listinfo/scikit-learn > -- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From bross_phobrain at sonic.net Wed Jan 22 18:23:39 2025 From: bross_phobrain at sonic.net (Bill Ross) Date: Wed, 22 Jan 2025 15:23:39 -0800 Subject: [scikit-learn] MinMaxScaler scales all (and only all) features in X? Message-ID: Hi, I have a mixture of table data and intermediate vectors from another model, which don't seem to scale productively. The fact that MinMaxScaler seems to do all features in X makes me wonder if/how people train with such mixed data. The easy approaches seem to be either scale the db data and then combine with the vectors, or just scale the db columns in place 'by hand'. Otherwise, I might consider adding a column-list option to the API. I suspect I'm just missing something important, since I wandered in following this purely-tabular example, which seemed good before adding ML-derived vectors: https://www.kaggle.com/code/carlmcbrideellis/tabular-classification-with-neural-networks-keras Any advice or more-appropriate example to follow would be great. Thanks, Bill -- -- Phobrain.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu Jan 23 02:41:07 2025 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 23 Jan 2025 08:41:07 +0100 Subject: [scikit-learn] MinMaxScaler scales all (and only all) features in X? In-Reply-To: References: Message-ID: Hi there, The way to do what you describe in scikit-learn would be via the ColumnTransformer https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html Note that however scikit-learn is mostly designed for multi-variate statistics, and thus does not tend to individualize columns in its transformers. Some of us are working on a related package, skrub (https://skrub-data.org), which is more focused to on heterogeneous dataframes. It does not currently have something that would help you much, but we are heavily brain-storming a variety of APIs to do flexible transformations of dataframes, including easily doing what you want. The challenge is to address the variety of cases. Hope this helps, Ga?l On Wed, Jan 22, 2025 at 03:23:39PM -0800, Bill Ross wrote: > Hi, > I have a mixture of table data and intermediate vectors from another model, which don't seem to scale productively. The fact that MinMaxScaler seems to do all features in X makes me wonder if/how people train with such mixed data. > The easy approaches seem to be either scale the db data and then combine with the vectors, or just scale the db columns in place 'by hand'. > Otherwise, I might consider adding a column-list option to the API. > I suspect I'm just missing something important, since I wandered in following this purely-tabular example, which seemed good before adding ML-derived vectors: > https://www.kaggle.com/code/carlmcbrideellis/tabular-classification-with-neural-networks-keras > Any advice or more-appropriate example to follow would be great. > Thanks, > Bill -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info https://bsky.app/profile/gaelvaroquaux.bsky.social From loic.esteve at ymail.com Thu Jan 23 02:47:02 2025 From: loic.esteve at ymail.com (=?utf-8?B?TG/Dr2MgRXN0w6h2ZQ==?=) Date: Thu, 23 Jan 2025 08:47:02 +0100 Subject: [scikit-learn] MinMaxScaler scales all (and only all) features in X? In-Reply-To: (Bill Ross's message of "Wed, 22 Jan 2025 15:23:39 -0800") References: Message-ID: <87jzamynt5.fsf@ymail.com> Hi, it feels like you want to use a ColumnTransformer that can apply different preprocessing to different columns, see e.g. this example: https://scikit-learn.org/stable/auto_examples/miscellaneous/plot_pipeline_display.html#displaying-a-complex-pipeline-chaining-a-column-transformer You can use 'passthrough' for the columns you don't want to change. Cheers, Lo?c > Hi, > > I have a mixture of table data and intermediate vectors from another model, which don't seem to scale productively. The fact that > MinMaxScaler seems to do all features in X makes me wonder if/how people train with such mixed data. > > The easy approaches seem to be either scale the db data and then combine with the vectors, or just scale the db columns in place 'by hand'. > > Otherwise, I might consider adding a column-list option to the API. > > I suspect I'm just missing something important, since I wandered in following this purely-tabular example, which seemed good before adding > ML-derived vectors: > > https://www.kaggle.com/code/carlmcbrideellis/tabular-classification-with-neural-networks-keras > > Any advice or more-appropriate example to follow would be great. > > Thanks, > > Bill > > -- From bross_phobrain at sonic.net Thu Jan 23 04:21:48 2025 From: bross_phobrain at sonic.net (Bill Ross) Date: Thu, 23 Jan 2025 01:21:48 -0800 Subject: [scikit-learn] MinMaxScaler scales all (and only all) features in X? In-Reply-To: References: Message-ID: > ColumnTransformer Thanks! I was also thinking of trying TabPFN, not researched yet, in case you can comment. Their attribution requirement seems overboard for what I want, unless it's flat-out miraculous for the flat-footed. :-) Some of us are working on a related package, skrub (https://skrub-data.org), which is more focused to on heterogeneous dataframes. It does not currently have something that would help you much, but we are heavily brain-storming a variety of APIs to do flexible transformations of dataframes, including easily doing what you want. The challenge is to address the variety of cases. Those are the storms we want. I'd love to know if/how/which ML tools are helping with that work, if appropriate here. Regards, Bill -------------- next part -------------- An HTML attachment was scrubbed... URL: