[scikit-learn] combining arrays of features to train an MLP

Mon Dec 19 12:17:13 EST 2016

Thanks, Thomas, that makes sense! Will submit a PR then to update the docstring.

Best,
Sebastian

> On Dec 19, 2016, at 11:06 AM, Thomas Evangelidis <tevang3 at gmail.com> wrote:
> 
> 
> Greetings,
> 
> My dataset consists of objects which are characterised by their structural features which are encoded into a so called "fingerprint" form. There are several different types of fingerprints, each one encapsulating different type of information. I want to combine two specific types of fingerprints to train a MLP regressor. The first fingerprint consists of a 2048 bit array of the form:
> 
>  FP1 = array([ 1.,  1.,  0., ...,  0.,  0.,  1.], dtype=float32)
> 
> The second is a 60 float number array of the form:
> 
> FP2 = array([ 2.77494618,  0.98973243,  0.34638652,  2.88303715,  1.31473857,
>        -0.56627112,  4.78847547,  2.29587913, -0.6786228 ,  4.63391109,
>        ...
>         0.        ,  0.        ,  5.89652792,  0.        ,  0.        ])
> 
> At first I tried to fuse them into a single 1D array of 2048+60 columns but the predictions of the MLP were worse than the 2 different MLP models trained from one of the 2 fingerprint types individually. My question: is there a more effective way to combine the 2 fingerprints in order to indicate that they represent different type of information?
>  
> To this end, I tried to create a 2-row array (1st row 2048 elements and 2nd row 60 elements) but sklearn complained:
> 
>     mlp.fit(x_train,y_train)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 618, in fit
>     return self._fit(X, y, incremental=False)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 330, in _fit
>     X, y = self._validate_input(X, y, incremental)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 1264, in _validate_input
>     multi_output=True, y_numeric=True)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y
>     ensure_min_features, warn_on_dtype, estimator)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 402, in check_array
>     array = array.astype(np.float64)
> ValueError: setting an array element with a sequence.
> 
> 
> Then I tried to create for each object of the dataset a 2D array of size 2x2048, by adding 1998 zeros in the second row in order both rows to be of equal size. However sklearn complained again:
> 
> 
>     mlp.fit(x_train,y_train)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 618, in fit
>     return self._fit(X, y, incremental=False)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 330, in _fit
>     X, y = self._validate_input(X, y, incremental)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py", line 1264, in _validate_input
>     multi_output=True, y_numeric=True)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y
>     ensure_min_features, warn_on_dtype, estimator)
>   File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 405, in check_array
>     % (array.ndim, estimator_name))
> ValueError: Found array with dim 3. Estimator expected <= 2.
> 
> 
> In another case of fingerprints, lets name them FP3 and FP4, I observed that the MLP regressor created using FP3 yields better results when trained and evaluated using logarithmically transformed experimental values (the values in y_train and y_test 1D arrays), while the MLP regressor created using FP4 yielded better results using the original experimental values. So my second question is: when combining both FP3 and FP4 into a single array is there any way to designate to the MLP that the features that correspond to FP3 must reproduce the logarithmic transform of the experimental values while the features of FP4 the original untransformed experimental values?
> 
> 
> I would greatly appreciate any advice on any of my 2 queries.
> Thomas
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> ======================================================================
> Thomas Evangelidis
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081, 
> 62500 Brno, Czech Republic 
> 
> email: tevang at pharm.uoa.gr
>          	tevang3 at gmail.com
> 
> website: https://sites.google.com/site/thomasevangelidishomepage/
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn at python.org
> https://mail.python.org/mailman/listinfo/scikit-learn