[scikit-learn] sklearn Pipeline: argument of type 'ColumnTransformer' is not iterable

Chamila Wijayarathna cdwijayarathna at gmail.com
Fri May 29 02:30:46 EDT 2020


Hi all,

I did manage to get the code to run using a workaround, which is bit ugly.

Following is the complete stacktrace of the error I was receiving.



















*Traceback (most recent call last):  File "<input>", line 1, in <module>
File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line
197, in runfile    pydev_imports.execfile(filename, global_vars,
local_vars)  # execute the script  File "C:\Program Files\JetBrains\PyCharm
2020.1.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line
18, in execfile    exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/ASUS/PycharmProjects/swelltest/enemble.py", line 112, in
<module>    ens.fit(x_train,y_train)  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 265, in fit    return super().fit(X, transformed_y, sample_weight)
File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_voting.py",
line 65, in fit    names, clfs = self._validate_estimators()  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\ensemble\_base.py",
line 228, in _validate_estimators    self._validate_names(names)  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
line 77, in _validate_names    invalid_names = [name for name in names if
'__' in name]  File
"C:\Users\ASUS\PycharmProjects\swelltest\venv\lib\site-packages\sklearn\utils\metaestimators.py",
line 77, in <listcomp>    invalid_names = [name for name in names if '__'
in name]TypeError: argument of type 'ColumnTransformer' is not iterable*

Following are the inputs in 'names' list at the time of the error.

1-
*ColumnTransformer(transformers=[('phy', Pipeline(steps=[('imputer',
SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['HR',
'RMSSD', 'SCL'])])2-
ColumnTransformer(transformers=[('fa',Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',
StandardScaler())]),['Squality', 'Sneutral', 'Shappy'])])*

Seems like that the library is attempting to search for '__' substring of
the ColumnTransform object, which it is unable to perform.

Since this name check doesn't have a signiticant effect on my
functionality, I commented following snippet at
*sklearn\utils\metaestimators.py.*




*invalid_names = [name for name in names if '__' in name]if invalid_names:
  raise ValueError('Estimator names must not contain __: got '
      '{0!r}'.format(invalid_names))*

Please let me know if there is a better workaround or that their are any
issues of commenting out this code.

Thanks

On Fri, May 29, 2020 at 10:33 AM Chamila Wijayarathna <
cdwijayarathna at gmail.com> wrote:

> Hello all,
>
> I hope I am writing to the correct mailing list about this issue that I am
> having. Please apologize me if I am not.
>
> I am attempting to use a pipeline to feed an ensemble voting classifier as
> I want the ensemble learner to use models that train on different feature
> sets. For this purpose, I followed the tutorial available at [1].
>
> Following is the code that I could develop so far.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> *y = df1.indexx = preprocessing.scale(df1)phy_features = ['A', 'B',
> 'C']phy_transformer = Pipeline(steps=[('imputer',
> SimpleImputer(strategy='median')), ('scaler',
> StandardScaler())])phy_processer = ColumnTransformer(transformers=[('phy',
> phy_transformer, phy_features)])fa_features = ['D', 'E', 'F']fa_transformer
> = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler',
> StandardScaler())])fa_processer = ColumnTransformer(transformers=[('fa',
> fa_transformer, fa_features)])pipe_phy = Pipeline(steps=[('preprocessor',
> phy_processer ),('classifier', SVM)])pipe_fa =
> Pipeline(steps=[('preprocessor', fa_processer ),('classifier', SVM)])ens =
> VotingClassifier(estimators=[pipe_phy, pipe_fa])cv = KFold(n_splits=10,
> random_state=None, shuffle=True)for train_index, test_index in
> cv.split(x):    x_train, x_test = x[train_index], x[test_index]    y_train,
> y_test = y[train_index], y[test_index]    ens.fit(x_train,y_train)
> print(ens.score(x_test, y_test))*
>
> However, when running the code, I am getting an error saying *TypeError:
> argument of type 'ColumnTransformer' is not iterable*, at the line
> *ens.fit(x_train,y_train).*
>
> What is the reason for this and how can I fix it?
>
> Thank you,
> Chamila
>


-- 
Chamila Dilshan Wijayarathna,
PhD Research Student
The University of New South Wales (UNSW Canberra)
Australian Centre for Cyber Security
Australian Defence Force Academy
PO Box 7916, Canberra BA ACT 2610
Australia
Mobile:(+61)416895795
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scikit-learn/attachments/20200529/7586b212/attachment-0001.html>


More information about the scikit-learn mailing list