[scikit-learn] combining arrays of features to train an MLP
Thomas Evangelidis
tevang3 at gmail.com
Mon Dec 19 11:06:36 EST 2016
Greetings,
My dataset consists of objects which are characterised by their structural
features which are encoded into a so called "fingerprint" form. There are
several different types of fingerprints, each one encapsulating different
type of information. I want to combine two specific types of fingerprints
to train a MLP regressor. The first fingerprint consists of a 2048 bit
array of the form:
> FP
> 1 = array([ 1., 1., 0., ..., 0., 0., 1.], dtype=float32)
The second is a 60 float number array of the form:
FP2 = array([ 2.77494618, 0.98973243, 0.34638652, 2.88303715,
> 1.31473857,
> -0.56627112, 4.78847547, 2.29587913, -0.6786228 , 4.63391109,
> ...
> 0. , 0. , 5.89652792, 0. , 0. ])
At first I tried to fuse them into a single 1D array of 2048+60 columns but
the predictions of the MLP were worse than the 2 different MLP models
trained from one of the 2 fingerprint types individually. My question: is
there a more effective way to combine the 2 fingerprints in order to
indicate that they represent different type of information?
To this end, I tried to create a 2-row array (1st row 2048 elements and 2nd
row 60 elements) but sklearn complained:
mlp.fit(x_train,y_train)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 618, in fit
> return self._fit(X, y, incremental=False)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 330, in _fit
> X, y = self._validate_input(X, y, incremental)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 1264, in _validate_input
> multi_output=True, y_numeric=True)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line
> 521, in check_X_y
> ensure_min_features, warn_on_dtype, estimator)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line
> 402, in check_array
> array = array.astype(np.float64)
> ValueError: setting an array element with a sequence.
>
Then I tried to create for each object of the dataset a 2D array of size
2x2048, by adding 1998 zeros in the second row in order both rows to be of
equal size. However sklearn complained again:
mlp.fit(x_train,y_train)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 618, in fit
> return self._fit(X, y, incremental=False)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 330, in _fit
> X, y = self._validate_input(X, y, incremental)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/neural_network/multilayer_perceptron.py",
> line 1264, in _validate_input
> multi_output=True, y_numeric=True)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line
> 521, in check_X_y
> ensure_min_features, warn_on_dtype, estimator)
> File
> "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line
> 405, in check_array
> % (array.ndim, estimator_name))
> ValueError: Found array with dim 3. Estimator expected <= 2.
In another case of fingerprints, lets name them FP3 and FP4, I observed
that the MLP regressor created using FP3 yields better results when trained
and evaluated using logarithmically transformed experimental values (the
values in y_train and y_test 1D arrays), while the MLP regressor created
using FP4 yielded better results using the original experimental values. So
my second question is: when combining both FP3 and FP4 into a single
array is there any way to designate to the MLP that the features that
correspond to FP3 must reproduce the logarithmic transform of the
experimental values while the features of FP4 the original untransformed
experimental values?
I would greatly appreciate any advice on any of my 2 queries.
Thomas
--
======================================================================
Thomas Evangelidis
Research Specialist
CEITEC - Central European Institute of Technology
Masaryk University
Kamenice 5/A35/1S081,
62500 Brno, Czech Republic
email: tevang at pharm.uoa.gr
tevang3 at gmail.com
website: https://sites.google.com/site/thomasevangelidishomepage/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20161219/feece819/attachment-0001.html>
More information about the scikit-learn
mailing list