[scikit-learn] MinMaxScaler scales all (and only all) features in X?

Tue Feb 11 04:31:09 EST 2025

I applied ColumnTransformer, but the results are unexpected. It could be
my lack of python skill, but it seems like the value of p1_1 in the
original should persist at 0,0 in the transformed? 

    ------- pre-scale
             p1_1      p1_2      p1_3      p1_4      p2_1  ...   resp1_4
  resp2_1   resp2_2   resp2_3   resp2_4
    760  1.382658  1.440719  1.555705  1.120171  1.717319  ...  0.598736
  0.659797  0.376331  0.403887  0.390283 

    ------- scaled
    [[0.17045455 0.04680535 0.04372197 ... 0.37633118 0.40388673
0.39028345] 

Thanks, 

Bill 

Fingers crossed on the formatting. 

column_trans = make_column_transformer(
            (MinMaxScaler(),
['order_in_session','big_stime','big_time','load_time','user_time','user_time2','mouse_down_time','mouse_time','mouse_dist','mouse_dist2','dot_count','mouse_dx','mouse_dy','mouse_vecx','mouse_vecy','dot_vec_len','mouse_maxv','mouse_maxa','mouse_mina','mouse_maxj','dot_max_vel','dot_max_acc','dot_max_jerk','dot_start_scrn','dot_end_scrn','dot_vec_ang']),
                remainder='passthrough') 

print('------- pre-scale')
print( str(X_train) ) 

X_train = column_trans.fit_transform(X_train) 

print('------- scaled')
print( str(X_train) )
print('------- /scaled') 

split 414 414
------- pre-scale
         p1_1      p1_2      p1_3      p1_4      p2_1  ...   resp1_4  
resp2_1   resp2_2   resp2_3   resp2_4
760  1.382658  1.440719  1.555705  1.120171  1.717319  ...  0.598736 
0.659797  0.376331  0.403887  0.390283
218  0.985645  0.532462  0.780601  0.687588  0.781293  ...  0.890886 
1.072392  0.536962  0.715136  0.792722
603  0.783806  0.437074  0.694766  0.371121  0.995891  ...  1.055465 
1.518875  1.129209  1.201864  1.476702
0    0.501352  0.253304  0.427804  0.283380  0.571035  ...  1.035323 
1.621431  0.838613  1.031724  1.131344
604  1.442482  1.019641  0.798387  1.055465  1.518875  ...  2.779447 
1.636363  1.212313  1.274595  1.723697 

... 

... 

------- scaled
[[0.17045455 0.04680535 0.04372197 ... 0.37633118 0.40388673 0.39028345]
 [0.27272727 0.04502229 0.04204036 ... 0.53696203 0.7151355  0.7927222 ]
 [0.30681818 0.04517088 0.04456278 ... 1.1292094  1.201864   1.4767016 ]
 ...
 [0.02272727 0.04457652 0.1680213  ... 1.796316   1.939811   2.1776829 ]
 [0.55681818 0.04546805 0.04176009 ... 0.48330075 0.37375322 0.29931256]
 [0.5        0.04457652 0.04091928 ... 0.6759416  0.7517819  0.8801653
]] 

---
--

Phobrain.com 

On 2025-01-23 01:21, Bill Ross wrote:

>> ColumnTransformer 
> 
> Thanks! 
> 
> I was also thinking of trying TabPFN, not researched yet, in case you can comment. <peeks/> Their attribution requirement seems overboard for what I want, unless it's flat-out miraculous for the flat-footed. :-)
> 
> Some of us are working on a related package, skrub (https://skrub-data.org), which is more focused to on heterogeneous dataframes. It does not currently have something that would help you much, but we are heavily brain-storming a variety of APIs to do flexible transformations of dataframes, including easily doing what you want. The challenge is to address the variety of cases. 
> 
> Those are the storms we want. I'd love to know if/how/which ML tools are helping with that work, if appropriate here. 
> 
> Regards, 
> Bill
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/scikit-learn/attachments/20250211/1896f0e4/attachment.html>